# A tibble: 4 × 3
x y z
<int> <int> <int>
1 1 5 9
2 2 6 10
3 3 7 11
4 4 8 12
Lab 3: Data wrangling
Not graded, just practice
Materials from lab
1 Tidy
1.1 Tidyverse
What is the relationship between tidyverse and readr?
In the tidyverse, what does “tidy data” refer to?
What is the purpose of the
purrr
package?What is the primary purpose of the
readr
package?Which of the following returned this message?
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ── ✔ dplyr 1.1.4 ✔ readr 2.1.5 ✔ forcats 1.0.0 ✔ stringr 1.5.1 ✔ ggplot2 3.5.1 ✔ tibble 3.2.1 ✔ lubridate 1.9.3 ✔ tidyr 1.3.1 ✔ purrr 1.0.2 ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ✖ dplyr::filter() masks stats::filter() ✖ dplyr::lag() masks stats::lag() ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
1.2 purrr
Suppose we have the following tibble, stored with the variable df
.
What will
map(df, mean)
return?Suppose we wanted to coerce each column in the previous tibble to the data type
double
with one line of code. Fill in the two arguments tomap
that would accomplish this:
- map(, )
1.3 Tibbles
Suppose we run the following code block and create 3 tibbles:
# create tibble tib
<- tibble(x = 1:2, y = c("a", "b"))
tib
# create tibble x
<- tribble(
x ~x, ~y,
2, 3,
4, 5
)
# create tibble tibby
<- tibble(
tibby age = c(1, 2, 3, 5),
name = c("dory", "hazel", "graham", "joan"),
alt_name = c("dolores", NA, NA, "joanie")
)
What will
is.data.frame(tib)
return?What will
typeof(tib)
return?What will
is_tibble(x)
return?Which of the following would convert a dataframe called
df
to a tibble? (note thatdf
is not defined above, consider any artibrary dataframe)What will
tibby$a
return?
2 Import
The questions below refer to this dataset borrowed from R4DS and available at the url https://pos.it/r4ds-students-csv.
- What does the csv in
read_csv()
stand for? Fill in the blank.
- separated values
Suppose we attempt to import the csv file given above with the code below. What will be the result?
<- read_csv("https://pos.it/r4ds-students-csv", data col_types = list(AGE = col_double()) )
Suppose we import the dataset given above and name it
data
. What willis.na(data[3,3])
return?Suppose we import the dataset given above and name it
data
. Which of the following would return the first column?True or false, assuming the same dataset the following code would rename the
Student ID
column tostudent_id
?%>% rename(student_id = `Student ID`) data
True or false, we can use a
read_*()
function fromreadr
to import a google sheet.
3 Transform
Which of the following
dplyr
functions retuns a data frame?Which of the following
dplyr
functions takes a number as their first argument?True or false, the following code blocks are equivalent.
# option 1 %>% select(Word, Frequency) %>% glimpse() ratings # option 2 glimpse(select(ratings, Word, Frequency))
True or false, the following code options are equivalent
# option 1 %>% ratings select(Word:Class) %>% mutate(Length/Frequency, .after = Class) # option 2 %>% ratings select(Word:Class) %>% mutate(Length/Frequency)
Recall that there are two possible values in the
Class
variable in theratings
dataset: “animal” or “plant”. How many rows would be in the data frame returned by the following code block?ratings %>% group_by(Class) %>% summarise(n = n())
Given the code block in the previous question, what will
n()
do?True or false, the following code blocks will return the same dataframe
# code block 1 %>% select(complexity = Complex) ratings # code block 2 %>% rename(complexity = Complex) ratings
Which of the following code blocks will return a dataframe including only the rows in
ratings
for which the Class value is “animal”?# code block a %>% filter(Class = "animal") ratings # code block b %>% filter(Class == "animal") ratings
By default the
arrange()
function arranges the rows in ascending order. Which of the following code blocks would arrange the Frequency variable in descending order?# code block a %>% arrange(Frequency, order = "descending") ratings # code block b %>% arrange(Frequency, order = "reverse") ratings # code block c %>% arrange(desc(Frequency)) ratings
Which of the following code blocks could be used to return the mean frequency by class?
# code block a %>% group_by(Class) %>% summarise( mean = mean(Frequency) ) ratings # code block b %>% summarise( ratings mean = mean(Frequency), .by = c(Class) ) # code block c %>% mean(Frequency) %>% group_by(Class) ratings