Lab 6: Model specification

Not graded, just practice

Author

Katie Schuler

Published

October 10, 2024

Practice your new modeling skills with these practice exam questions! Best to open a fresh Google Colab notebook and test things out! Refer to the study guide to find answers as well.

1 Types of models

  1. Which of the following best describes the goal of a regression model?

  2. In classification tasks, the output variable (label) is typically:

  3. Which of the following is an example of a regression problem?

  4. What is the primary difference between regression and classification?

  5. Which of the following tasks is a classification problem?

  6. True or false, supervised learning requires labeled data to train the model.

  7. True or false, in unsupervised learning, the model attempts to identify patterns or structures in data without any specific target variable.

2 Model specification

  1. Which of the following is the first step in model specification?

  2. What does model specification involve?

  3. Which of the following is NOT part of model specification?

  4. Which of the following describes a correctly specified model?

  5. True or false, Adding interaction terms between predictors is part of the model specification process.

  6. Model specification is the final step in the model-building process.

3 Functional form of linear models

  1. Write the equation that expresses the response variable as a weighted sum of regressors (our favorite).

\(y=\sum_{i=1}^{n}w_ix_i\)

  1. In the linear regression equation \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_n x_n + \epsilon\) , what do the \(\beta\)’s represent?

  1. Write the linear model equation in matrix notation.

\(y = Xβ + ε\)

or similar

  1. In matrix notation, what is \(\mathbf{X}\)?

  2. Suppose our SwimRecords data includes the year, sex, record time, swimsuit type, and swim cap type. Which of the following variables is most likely to be irrelevant for predicting swim times?

  3. What is the potential issue of including too many irrelevant variables in your model?

4 Finished?

Work together with support from TAs on problem set 3, questions 5:

Suppose your roommate is keeping a bunch of plants in your apartment. You notice that the plants exposed to more light seem to be taller, and — as an emerging data scientist — you record these data in a csv file: polynomial_plants.csv. Explore the relationship between light_exposure and plant_height across different plant species by plotting the data using an appropriate geom. Then, specify, fit, and plot polynomial models of increasing degrees (linear, quadratic, and cubic) to the data. Start by specifying and fitting a simple linear model. Next, specify and fit second- and third-degree polynomial models, and visualize each in ggplot. Which best captures the relationship between light_exposure and plant_height? For each model, make sure you specify as a mathematical expression first in LaTex, then use infer to specify and fit the model.