Lab 9: Model accuracy – DataSci for Lang & Mind

2 Model accuracy in R

Questions 19-20 refer to the following code and output:

model <- lm(y ~ 1 + x, data)
summary(model)


Call:
lm(formula = y ~ 1 + x, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.38959 -0.32626 -0.04605  0.31967  1.65977 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.08280    0.07418   28.08   <2e-16 ***
x            0.47844    0.01242   38.51   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5139 on 198 degrees of freedom
Multiple R-squared:  0.8822,    Adjusted R-squared:  0.8816 
F-statistic:  1483 on 1 and 198 DF,  p-value: < 2.2e-16

What is the \(R^2\) value for the model fit above?

Does the value you entered in 19 reflect \(R^2\) on the population or on the sample?

population sample

Questions 21-23 refer to the following code and output:

# we divide the data into v folds for cross-validation
set.seed(2) 
splits <- vfold_cv(data)

# model secification 
model_spec <- 
  linear_reg() %>%  
  set_engine(engine = "lm")  

# add a workflow
our_workflow <- 
  workflow() %>%
  add_model(model_spec) %>%  
  add_formula(y ~ x) 

# fit models to our folds 
fitted_models <- 
  fit_resamples(
    object = our_workflow, 
    resamples = splits
    ) 

fitted_models %>%
    collect_metrics()

# A tibble: 2 × 6
  .metric .estimator  mean     n std_err .config             
  <chr>   <chr>      <dbl> <int>   <dbl> <chr>               
1 rmse    standard   0.507    10  0.0397 Preprocessor1_Model1
2 rsq     standard   0.890    10  0.0146 Preprocessor1_Model1

In the cross-validation performed above, how many folds were the data split into?

2 5 10

What \(R^2\) do we estimate for the population?

What model has been fit?

y ~ x y ~ x^2 x ~ y vfold_cv

Questions 24-26 refer to the following code and output:

# we bootstrap the data for cross-validation
set.seed(2) 
bootstrap <- bootstraps(data) 

# fit models to our folds 
fitted_models_boot <- 
  fit_resamples(
    object = our_workflow, 
    resamples = bootstrap
    ) 

fitted_models_boot %>%
    collect_metrics()

# A tibble: 2 × 6
  .metric .estimator  mean     n std_err .config             
  <chr>   <chr>      <dbl> <int>   <dbl> <chr>               
1 rmse    standard   0.507    25 0.00946 Preprocessor1_Model1
2 rsq     standard   0.887    25 0.00377 Preprocessor1_Model1

How many bootstrap samples did we generate?

True or false, we fit the same model to the bootstrap data as we did in the cross-validation code.

TRUE FALSE

True or false, the \(R^2\) estimated by bootstrapping is equal to the \(R^2\) estimated by cross-validation.

TRUE FALSE

1 Model accuracy

2 Model accuracy in R