Lab 3: Visualize and wrangle data

Not graded, just practice

Author

Katie Schuler

Published

September 14, 2023

Todays lab will make use of the ratings data in the languageR package. glimpse() of data frame is provided for your reference.

ratings %>% glimpse()
Rows: 81
Columns: 14
$ Word             <fct> almond, ant, apple, apricot, asparagus, avocado, badg…
$ Frequency        <dbl> 4.204693, 5.347108, 6.304449, 3.828641, 3.663562, 3.4…
$ FamilySize       <dbl> 0.0000000, 1.3862944, 1.0986123, 0.0000000, 0.0000000…
$ SynsetCount      <dbl> 1.0986123, 1.0986123, 1.0986123, 1.3862944, 1.0986123…
$ Length           <int> 6, 3, 5, 7, 9, 7, 6, 6, 3, 6, 3, 8, 10, 9, 8, 5, 9, 5…
$ Class            <fct> plant, animal, plant, plant, plant, plant, animal, pl…
$ FreqSingular     <int> 24, 69, 315, 26, 19, 24, 53, 74, 155, 37, 118, 15, 26…
$ FreqPlural       <int> 42, 140, 231, 19, 19, 6, 78, 77, 103, 14, 180, 19, 31…
$ DerivEntropy     <dbl> 0.0000, 0.5620, 0.4960, 0.0000, 0.0000, 0.0000, 0.634…
$ Complex          <fct> simplex, simplex, simplex, simplex, simplex, simplex,…
$ rInfl            <dbl> -0.54232429, -0.70026465, 0.30900484, 0.30010459, 0.0…
$ meanWeightRating <dbl> 1.4860, 3.3489, 2.1948, 1.3216, 1.4424, 1.3256, 3.047…
$ meanSizeRating   <dbl> 1.8912, 3.6275, 2.4730, 1.7597, 1.8660, 1.7737, 3.369…
$ meanFamiliarity  <dbl> 3.72, 3.60, 5.84, 4.40, 3.68, 4.12, 2.12, 5.68, 3.20,…

1 More visualization

Given code blocks a, b, and c; and the plot below:

# CODE BLOCK a ---------------------------#
ggplot(
    data = ratings, 
    mapping = aes(x = Frequency, y = meanFamiliarity)
    ) +
    geom_point(color = "blue")
# CODE BLOCK b ---------------------------#
ggplot(
    data = ratings, 
    mapping = aes(x = Frequency, y = meanFamiliarity, color = "blue")
    ) 
# CODE BLOCK c ---------------------------#
ggplot(
    data = ratings, 
    mapping = aes(x = Frequency, y = meanFamiliarity)
    ) +
    geom_point()

Which of the code blocks above generate plot A above?

In plot A above, is the color aesthetic mapped, set, or both?

In plot A above, which of the following aesthetics should we set to make the points more transparent?

In plot A above, which of the following would change the x axis label to “FQ”?

In plot B above, which geom(s) are used to represent the data?

True or false, the blue line in plot B above is mapped to the Class aesthetic?

In plot B above, which of the following variables is mapped to the x aesthetic?

True or false, in plot B above, the default statistical transformation in the geom responsible for the red dots is “identity”.

Suppse we run the code below. Which of the following plots will be returned?

ggplot(
    data = ratings, 
    mapping = aes(x = Frequency, y = meanFamiliarity, color = Class)
    ) +
    geom_point() +
    geom_smooth(method = "lm", color = "red") 

Suppose we run the following code block, which plot will be returned?

ggplot(
    data = ratings, 
    mapping = aes(x = Class, fill = Complex)
    ) +
    geom_bar() 

To generate the facets in the plot below, which of the following lines of code must be included?

To adjust the size of the font to 20pt in the complete theme theme_minimal(), what argument should we include?

What would happen if we added the layer scale_fill_manual(values = c("green", "orange")) to the following plot?

2 Data wrangling

Which of the following dplyr functions retuns a data frame?

Which of the following dplyr functions takes a number as their first argument?

True or false, the following code blocks are equivalent.

# option 1
ratings %>% select(Word, Frequency) %>% glimpse()

# option 2
glimpse(select(ratings, Word, Frequency))

True or false, the following code options are equivalent

# option 1
ratings %>% 
    select(Word:Class) %>% 
    mutate(Length/Frequency, .after = Class)

# option 2
ratings %>% 
    select(Word:Class) %>% 
    mutate(Length/Frequency)

Recall that there are two possible values in the Class variable in the ratings dataset: “animal” or “plant”. How many rows would be in the data frame returned by the following code block?

ratings %>% group_by(Class) %>% summarise(n = n())

Given the code block in the previous question, what will n() do?

True or false, the following code blocks will return the same dataframe

# code block 1
ratings %>% select(complexity = Complex) 


# code block 2
ratings %>% rename(complexity = Complex)

Which of the following code blocks will return a dataframe including only the rows in ratings for which the Class value is “animal”?

# code block a
ratings %>% filter(Class = "animal")

# code block b
ratings %>% filter(Class == "animal")

By default the arrange() function arranges the rows in ascending order. Which of the following code blocks would arrange the Frequency variable in descending order?

# code block a
ratings %>% arrange(Frequency, order = "descending")

# code block b
ratings %>% arrange(Frequency, order = "reverse")

# code block c
ratings %>% arrange(desc(Frequency))

Which of the following code blocks could be used to return the mean frequency by class?

# code block a
ratings %>% group_by(Class) %>% summarise( mean = mean(Frequency) )

# code block b
ratings %>% summarise( 
    mean = mean(Frequency), .by = c(Class) )

# code block c
ratings %>% mean(Frequency) %>% group_by(Class)