Model Fitting

Data Science for Studying Language and the Mind

Katie Schuler

2023-10-17

You are `here`

Data science with R

Hello, world!
R basics
Data importing
Data visualization
Data wrangling

Stats & Model buidling

Sampling distribution
Hypothesis testing
Model specification
Model fitting
Model accuracy
Model reliability

More advanced

Classification
Feature engineering (preprocessing)
Inference for regression
Mixed-effect models

Model building overview

Model specification: what is the form?
Model fitting: you have the form, how do you guess the free parameters?
Model accuracy: you’ve estimated the parameters, how well does that model describe your data?
Model reliability: when you estimate the parameters, there is some uncertainty on them

Model specification

a brief review

Types of models

Specification

Response, \(y\)
Explanatory, \(x_n\)
Functional form, \(y=\beta_0 + \beta_1x_1 + \epsilon\)
Model terms
- Intercept
- Main
- Interaction
- Transformation

Linear model functional form

field	linear model eq
`h.s. algebra`	\(y=ax+b\)
`machine learning`	\(y = w_0 + w_1x_1 + w_2x_2 + ... + w_nx_n\)
`statistics`	\(y = β_0 + β_1x_1 + β_2x_2 + ... + β_nx_n + ε\)
`matrix`	\(y = Xβ + ε\)

Model fitting

flowchart TD
    spec(Model specification) --> fit(Estimate free parameters) 
    fit(Estimate free parameters) --> fitted(Fitted model)

Fitting a linear model

flowchart TD
    spec(Model specification \n y = ax + b) --> fit(Estimate free parameters) 
    fit(Estimate free parameters) --> fitted(Fitted model \n y = 0.7x + 0.6)

ggplot(data, aes(x = x, y = y)) +
    geom_point(size = 4, color = "darkred") +
    geom_smooth(method = "lm", formula = 'y ~ x', se = FALSE)

Fitting by intuition

How would you draw a “best fit” line?

Fitting by intuition

Which line fits best? How can you tell?

Quantifying “goodness” of fit

We can measure how close the model is to the data

residuals

\(SSE=\sum_{i=i}^{n} (d_{i} - m_{i})^2\)

x	y	pred	err	sq_err
1	1.2	1.3	-0.1	0.01
2	2.5	2.0	0.5	0.25
3	2.3	2.7	-0.4	0.16
4	3.1	3.4	-0.3	0.09
5	4.4	4.1	0.3	0.09

x	y	pred	err	sq_err
1	1.2	1.58	-0.38	0.1444
2	2.5	2.62	-0.12	0.0144
3	2.3	3.66	-1.36	1.8496
4	3.1	4.70	-1.60	2.5600
5	4.4	5.74	-1.34	1.7956

But there are infinite possibilities

We can’t test all Inf of the possible free parameters

\(y=b_0+b_1x_1\)

Free parameters to test

Level = SSE

Error surface

Gradient descent, intuition

Gradient descent

Gradient descent linear model

Linear models are convex functions: one minimum

Ordinary least squares

Linear models have a solution: we can solve for the values with linear algebra.

\(y = ax + b\)

\(1.2 = a1 + b\)

\(2.5 = a2 + b\)

lm(y ~ 1 + x, data)


Call:
lm(formula = y ~ 1 + x, data = data)

Coefficients:
(Intercept)            x  
        0.6          0.7

data %>%
    specify(y ~ 1 + x) %>%
    fit()

# A tibble: 2 × 2
  term      estimate
  <chr>        <dbl>
1 intercept    0.600
2 x            0.7

ordinary least squares

Model Fitting

You are here

Data science with R

Stats & Model buidling

More advanced

Model building overview

Model specification

Types of models

Specification

Linear model functional form

Model fitting

Model fitting

Fitting a linear model

Fitting by intuition

Fitting by intuition

Quantifying “goodness” of fit

\(SSE=\sum_{i=i}^{n} (d_{i} - m_{i})^2\)

But there are infinite possibilities

Free parameters to test

Level = SSE

Error surface

Gradient descent, intuition

Gradient descent

Gradient descent linear model

Ordinary least squares

\(y = ax + b\)

You are `here`