Data Science for Studying Language and the Mind
2023-10-17
here
Model fitting
model specification | |
---|---|
R syntax | rt ~ 1 + experience |
R syntax | rt ~ experience |
Equation | \(y=w_0+w_1x_1\) |
flowchart TD spec(Model specification) --> fit(Estimate free parameters) fit(Estimate free parameters) --> fitted(Fitted model)
\(y = 211.271 + -1.695x\)
model specification | |
---|---|
R syntax | rt ~ 1 + experience |
R syntax | rt ~ experience |
Equation | \(y=w_0+w_1x_1\) |
Quantifying our intution with sum of squared error
experience | rt | prediction | error | squared_error |
---|---|---|---|---|
49 | 124 | 128.216 | 4.216 | 17.774656 |
69 | 95 | 94.316 | -0.684 | 0.467856 |
89 | 71 | 60.416 | -10.584 | 112.021056 |
99 | 45 | 43.466 | -1.534 | 2.353156 |
109 | 18 | 26.516 | 8.516 | 72.522256 |
\(SSE=\sum_{i=i}^{n} (d_{i} - m_{i})^2 = 205.139\)
Given some data:
experience | rt | prediction | error | squared_error |
---|---|---|---|---|
49 | 124 | 128.216 | 4.216 | 17.774656 |
69 | 95 | 94.316 | -0.684 | 0.467856 |
89 | 71 | 60.416 | -10.584 | 112.021056 |
99 | 45 | 43.466 | -1.534 | 2.353156 |
109 | 18 | 26.516 | 8.516 | 72.522256 |
Compute the sum of squared error:
\(SSE=\sum_{i=i}^{n} (d_{i} - m_{i})^2 = 205.139\)
A search problem: we have a parameter space, a cost function, and our job is to search through the space to find the point that minimizes the cost function.
Define our cost function:
\(y = w_0 + w_1x_1\)
We have a system of equations:
We can express them as a matrix: \(Y = Xw + \epsilon\)
And solve with linear algebra: \(w = (X^TX)^{-1}X^TY\)
We need to construct X and Y (must be matrices):
(response_matrix <- data %>% select(rt) %>% as.matrix())
(explanatory_matrix <- data %>% mutate(int = 1) %>% select(int, experience) %>% as.matrix())
rt
[1,] 124
[2,] 95
[3,] 71
[4,] 45
[5,] 18
int experience
[1,] 1 49
[2,] 1 69
[3,] 1 89
[4,] 1 99
[5,] 1 109
Then we can use our function to generate the OLS solution:
Call:
lm(formula = rt ~ experience, data = data)
Coefficients:
(Intercept) experience
211.271 -1.695