Data Science for Studying Language and the Mind

Katie Schuler

2023-11-07

`here`

- Hello, world!
- R basics
- Data importing
- Data visualization
- Data wrangling

- Sampling distribution
- Hypothesis testing
- Model specification
- Model fitting
- Model accuracy
`Model reliability`

- Classification
- Feature engineering (preprocessing)
- Inference for regression
- Mixed-effect models

**Model specification**: what is the form?**Model fitting**: you have the form, how do you guess the free parameters?**Model accuracy**: you’ve estimated the parameters, how well does that model describe your data?`Model reliability`

: when you estimate the parameters, there is some uncertainty on them

- supervised learning | regression | linear
`y ~ x`

- \(y=w_0+w_1x_1\)

How certain can we be about the parameter estimates we obtained?

```
# A tibble: 2 × 2
term estimate
<chr> <dbl>
1 intercept 1.75
2 x 0.733
```

But… why is there uncertainty around the parameter estimates at all?

We are interested in the model parameters that best describe the *population from which the sample was drawn* (not a given sample)

- Due to
*sampling error*, we can expect some variability in the model parameters that describe a sample of data.

- We can think of model reliability as the
*stability*of the parameters of a fitted model. - The more data we collect, the more reliable the model parameters will be.

We can obtain confidence intervals around parameter estimates for models in the same we we did for point estimates like the mean: **bootstrapping**

- Draw bootstrap samples from the observed data
- Fit the model of interest to each bootstrapped sample
- Construct the sampling distribution of parameter estimates across bootstraps

`infer`

Fit bootstraps

Get confidence interval