# check out data
glimpse(iris)
## Rows: 150
## Columns: 5
## $ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.…
## $ Sepal.Width <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.…
## $ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.…
## $ Petal.Width <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.…
## $ Species <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s…
dim(iris)
## [1] 150 5
There are 5 variables (Sepal length, sepal width, petal length, petal width, and species), and 150 observations.
iris1 <- iris %>%
filter(Species == c("virginica", "versicolor"),
Sepal.Length > 6,
Sepal.Width > 2.5)
glimpse(iris1)
## Rows: 28
## Columns: 5
## $ Sepal.Length <dbl> 6.4, 6.1, 6.7, 6.1, 6.1, 6.6, 6.7, 6.1, 6.2, 6.3, 7.1, 6.…
## $ Sepal.Width <dbl> 3.2, 2.9, 3.1, 2.8, 2.8, 3.0, 3.0, 3.0, 2.9, 3.3, 3.0, 3.…
## $ Petal.Length <dbl> 4.5, 4.7, 4.4, 4.0, 4.7, 4.4, 5.0, 4.6, 4.3, 6.0, 5.9, 5.…
## $ Petal.Width <dbl> 1.5, 1.4, 1.4, 1.3, 1.2, 1.4, 1.7, 1.4, 1.3, 2.5, 2.1, 2.…
## $ Species <fct> versicolor, versicolor, versicolor, versicolor, versicolo…
After filtering the iris data set, there are only 28 observations. There are still 5 variables.
iris2 <- iris1 %>%
select(Species, Sepal.Length, Sepal.Width)
glimpse(iris2)
## Rows: 28
## Columns: 3
## $ Species <fct> versicolor, versicolor, versicolor, versicolor, versicolo…
## $ Sepal.Length <dbl> 6.4, 6.1, 6.7, 6.1, 6.1, 6.6, 6.7, 6.1, 6.2, 6.3, 7.1, 6.…
## $ Sepal.Width <dbl> 3.2, 2.9, 3.1, 2.8, 2.8, 3.0, 3.0, 3.0, 2.9, 3.3, 3.0, 3.…
There are still 28 observations in this data set but only 3 variables, as we did not further filter from iris1 except for paring down the number of columns.
iris3 <- iris2 %>%
arrange(by = desc(Sepal.Length))
head(iris3, n=6)
## Species Sepal.Length Sepal.Width
## 1 virginica 7.7 2.6
## 2 virginica 7.7 2.8
## 3 virginica 7.4 2.8
## 4 virginica 7.1 3.0
## 5 virginica 6.9 3.2
## 6 virginica 6.8 3.0
iris4 <- iris3 %>%
mutate(Sepal.Area = Sepal.Length * Sepal.Width)
glimpse(iris4)
## Rows: 28
## Columns: 4
## $ Species <fct> virginica, virginica, virginica, virginica, virginica, vi…
## $ Sepal.Length <dbl> 7.7, 7.7, 7.4, 7.1, 6.9, 6.8, 6.7, 6.7, 6.7, 6.7, 6.7, 6.…
## $ Sepal.Width <dbl> 2.6, 2.8, 2.8, 3.0, 3.2, 3.0, 3.1, 3.0, 3.3, 3.1, 3.3, 3.…
## $ Sepal.Area <dbl> 20.02, 21.56, 20.72, 21.30, 22.08, 20.40, 20.77, 20.10, 2…
There are 50 observations and 4 variables in this data set.
6.Create iris5 that calculates the average sepal length, the average sepal width, and the sample size of the entire iris4 data frame and print iris5.
iris5 <- iris4 %>%
summarize(Mean.Sepal.Length=mean(Sepal.Length),
Mean.Sepal.Width=mean(Sepal.Width),
TotalNumber=n())
print(iris5)
## Mean.Sepal.Length Mean.Sepal.Width TotalNumber
## 1 6.575 3.003571 28
iris6 <- iris4 %>%
group_by(Species) %>%
summarize(Mean.Sepal.Length=mean(Sepal.Length),
Mean.Sepal.Width=mean(Sepal.Width),
TotalNumber=n())
print(iris6)
## # A tibble: 2 × 4
## Species Mean.Sepal.Length Mean.Sepal.Width TotalNumber
## <fct> <dbl> <dbl> <int>
## 1 versicolor 6.33 2.97 9
## 2 virginica 6.69 3.02 19
In these exercises, you have successively modified different versions of the data frame iris1 iris2 iris3 iris4 iris5 iris6. At each stage, the output data frame from one operation serves as the input fro the next. A more efficient way to do this is to use the pipe operator %>% from the tidyr package. See if you can rework all of your previous statements (except for iris5) into an extended piping operation that uses iris as the input and generates irisFinal as the output.
irisFinal <- iris %>%
filter(Species == c("virginica", "versicolor"),
Sepal.Length > 6,
Sepal.Width > 2.5) %>%
select(Species, Sepal.Length, Sepal.Width) %>%
arrange(by = desc(Sepal.Length)) %>%
group_by(Species) %>%
summarize(Mean.Sepal.Length=mean(Sepal.Length),
Mean.Sepal.Width=mean(Sepal.Width),
TotalNumber=n())
print(irisFinal)
## # A tibble: 2 × 4
## Species Mean.Sepal.Length Mean.Sepal.Width TotalNumber
## <fct> <dbl> <dbl> <int>
## 1 versicolor 6.33 2.97 9
## 2 virginica 6.69 3.02 19
Create a ‘longer’ data frame using the original iris data set with three columns named “Species”, “Measure”, “Value”. The column “Species” will retain the species names of the data set. The column “Measure” will include whether the value corresponds to Sepal.Length, Sepal.Width, Petal.Length, or Petal.Width and the column “Value” will include the numerical values of those measurements.
iris_ind <- iris %>%
mutate(individual=1:length(Sepal.Length))
longer <- iris_ind %>%
pivot_longer(cols = Sepal.Length:Petal.Width,
names_to = "Measure",
values_to = "Value")
glimpse(longer)
## Rows: 600
## Columns: 4
## $ Species <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, set…
## $ individual <int> 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5,…
## $ Measure <chr> "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width…
## $ Value <dbl> 5.1, 3.5, 1.4, 0.2, 4.9, 3.0, 1.4, 0.2, 4.7, 3.2, 1.3, 0.2,…