Data Science for Studying Language and the Mind

Katie Schuler

2023-10-17

`here`

- Hello, world!
- R basics
- Data importing
- Data visualization
- Data wrangling

- Sampling distribution
- Hypothesis testing
- Model specification
`Model fitting`

- Model accuracy
- Model reliability

- Classification
- Feature engineering (preprocessing)
- Inference for regression
- Mixed-effect models

**Model specification**: what is the form?**Model fitting**: you have the form, how do you guess the free parameters?**Model accuracy**: you’ve estimated the parameters, how well does that model describe your data?**Model reliability**: when you estimate the parameters, there is some uncertainty on them

a brief review

- Response, \(y\)
- Explanatory, \(x_n\)
- Functional form, \(y=\beta_0 + \beta_1x_1 + \epsilon\)
- Model terms
- Intercept
- Main
- Interaction
- Transformation

field | linear model eq |
---|---|

`h.s. algebra` |
\(y=ax+b\) |

`machine learning` |
\(y = w_0 + w_1x_1 + w_2x_2 + ... + w_nx_n\) |

`statistics` |
\(y = β_0 + β_1x_1 + β_2x_2 + ... + β_nx_n + ε\) |

`matrix` |
\(y = Xβ + ε\) |

How would you draw a “best fit” line?

Which line fits best? How can you tell?

We can measure how close the model is to the data

`residuals`

x | y | pred | err | sq_err |
---|---|---|---|---|

1 | 1.2 | 1.3 | -0.1 | 0.01 |

2 | 2.5 | 2.0 | 0.5 | 0.25 |

3 | 2.3 | 2.7 | -0.4 | 0.16 |

4 | 3.1 | 3.4 | -0.3 | 0.09 |

5 | 4.4 | 4.1 | 0.3 | 0.09 |

x | y | pred | err | sq_err |
---|---|---|---|---|

1 | 1.2 | 1.58 | -0.38 | 0.1444 |

2 | 2.5 | 2.62 | -0.12 | 0.0144 |

3 | 2.3 | 3.66 | -1.36 | 1.8496 |

4 | 3.1 | 4.70 | -1.60 | 2.5600 |

5 | 4.4 | 5.74 | -1.34 | 1.7956 |

We can’t test all `Inf`

of the possible free parameters

\(y=b_0+b_1x_1\)

Linear models are convex functions: one minimum

Linear models have a solution: we can solve for the values with linear algebra.

\(1.2 = a1 + b\)

\(2.5 = a2 + b\)

`ordinary least squares`