Exam 1 study guide
Additional study materials
- Practice exam 1
- Provided reference sheets (you will have a copy of this on the exam)
- Practice exam 1 solutions
The first exam will test the following learning objectives, divided into the following topic areas. For each topic area, you should be able to do the list that follows. You can think of this as a studying checklist!
- R Basics: general
- Assign an object to a valid variable name, list all variables in the environment and remove them
- Use packages and differentiate between installing and loading
- Get help with a function or package from R
- Return information about an object, including its structure, data type, length, and attributes
- Explain what functions and control flow are; differentiate between types of control flow
- R Basics: vectors, operations, and subsetting
- Distinguish between an atomic vector and a list
- Create atomic vectors and determine their data types
- Differentiate between implicit and explicit coercion and coerce an object to another type
- Use arithmetic, comparison, and logical operators on vectors
- Explain how more complex data structures are built from atomic vectors and create them
- Distinguish between
NA
andNULL
- Subset vectors and higher dimensional objects with the
[
,[[
and$
operators
- Data visualization: basics
- Describe how to create a plot with
ggplot2
including the 3 basic requirements - Distinguish between mapping and setting aesthetics
- Describe how
ggplot2
maps categorical variables to aesthetics and interpret the 3 common warnings people encounter in this process - Interpret
ggplot()
calls with explicit or implicit arguments for data and mapping - Recognize the geoms we discussed in class and select which to use for a given situation
- Differentiate between globally and locally defined mappings and recognize them in given plot (or code)
- Describe how to create a plot with
- Data visualization: layers
- Use the
position
argument to modify the position of the geoms ingeom_bar()
orgeom_point()
- Describe
stat="identity"
and describe the default transformations forgeom_bar()
,geom_histogram()
, andgeom_smooth()
- Set the smoothing method for
geom_smooth()
and the bins or bindwidth forgeom_histogram()
- Facet a plot with
facet_wrap()
andfacet_grid()
- Modify axis, legend, and plot labels with
labs()
- Apply a given theme to a plot and adjust the base font size or family.
- Describe scales and recognize the outcome of adding a scale layer
- Use the
- Data importing
- Load the
tidyverse
, recognize the included packages, and critique code for redundant loading - Construct a tidy dataset and critique whether a given dataset is tidy
- Use the map function from the
purr
package - Create a tibble and distinguish between a tibble and a data frame
- Use
readr
to read delimited files and determine whetherreadr
can read files of a given type - Use
col_types
to add a column specifications and explain how readr guesses without it - Solve the 3 most common importing problems we discussed in class
- Load the
- Data wrangling
- Describe the common structure of
dplyr
functions (aka verbs) - Combine
dplyr
functions with the pipe operator to solve complex problems - Manipulate rows with
filter()
,arrange()
, anddistinct()
- Maniuplate columns with
mutate()
,select()
, andrename()
- Group and summarise data with
group_by()
,summarise()
, andungroup()
- Evaulate
dplyr
functions that include the common arguments we covered in class
- Describe the common structure of
- Sampling distribution
- Explore a dataset with an appropriate figure (histogram, boxplot, scatterplot) and summary statistics appropriate for the distribution.
- Recognize uniform and Gaussian probability distributions in a plot or equation and use R’s functions
d*()
,p*()
, andr*()
to work with these distributions - Explain the difference between the parameter and the paramter estimate
- Construct the sampling distribution of a paramater estimate with
infer
and quantify the spread of the distribution with a confidence interval. - Understand the difference between constructing a confidence interval the standard error method vs. the percentile method.
- Hypothesis testing
- Given a set of data, implement the 3-step hypothesis testing framework nonparametrically: (1) Pose a null hypothesis, (2) quantify how likely a given pattern of results is under the null, and (3) determine whether to reject the null (conceptually and with the
infer
framework). - Given a theoretical distriubiton (e.g. t), implement the 3-step hypothesis testing framework parametrically.
- Given an observed correlation, determine whether a correlation is positive, negative, or no correlation.
- Given a set of data, implement the 3-step hypothesis testing framework nonparametrically: (1) Pose a null hypothesis, (2) quantify how likely a given pattern of results is under the null, and (3) determine whether to reject the null (conceptually and with the