Practice Quiz 2

Not graded, just practice

Author

Katie Schuler

Published

October 6, 2023

The data

This quiz refers to data simulated from Johnson & Newport (1989), who studied the English language proficiency of 46 native Korean or Chinese speakers who arrived in the US between the ages of 3 and 39. The researchers were interested in the critical period for language acquisition and wanted to know whether the participants’ age of arrival to the United States played a role in their English language proficiency.

The simulated data are stored in the tibble johnson_newport_1989. Here is a glimpse() at the tibble for your reference:

glimpse(johnson_newport_1989)

Rows: 69
Columns: 4
$ score     <dbl> 270.8899, 270.2497, 267.1322, 268.3546, 263.7737, 263.8069, …
$ age       <dbl> 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, …
$ ageGroup  <chr> "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", …
$ langGroup <chr> "native", "native", "native", "native", "native", "native", …

1 Sampling distribution

Johnson and Newport (1989) reported the mean and standard deviation of participants’ scores on the English proficiency test, grouped by an ageGroup variable, which divides age into 5 groups. Below we computed these descriptive statistics on our simulated data. Then, we used infer to generate the sampling distribution for the 3-7 year old age group, visualize the distribution, and shade the confidence interval.

# A. compute descriptive statistics by group 
johnson_newport_1989 %>% 
    group_by(ageGroup) %>%
    summarise(n = n(), mean = mean(score), sd = sd(score))

# A tibble: 5 × 4
  ageGroup     n  mean    sd
  <chr>    <int> <dbl> <dbl>
1 0           23  269.  2.44
2 11-15        8  232. 11.2 
3 17-39       23  220. 23.9 
4 3-7          7  271.  3.56
5 8-10         8  261.  7.14

# B. generate the sampling distribution for 3-7 group
samp_distribution <- johnson_newport_1989 %>%
    filter(ageGroup == "3-7") %>%
    specify(response = score) %>%
    generate(reps = 1000, type = "bootstrap") %>% 
    calculate(stat = "mean")

# C. get confidence interval 
ci <- samp_distribution %>%
    get_confidence_interval(______________)

# D. visualize sampling distribution and confidence interval 
samp_distribution %>%
    visualize() +
    shade_ci(endpoints = ci)

True or false, the descriptive statistics reported above are parametric.

TRUE FALSE

The sampling distribution of the mean looks approximately Gaussian. The probability densitiy function for the Gaussian distribution is given by which of the following equations?
1. \(\frac{\sum_{i=i}^{n} x_{i}}{n}\)
2. \(\sqrt{\frac{\sum_{i=1}^n (x_i - \overline{x})^2}{n-1}}\)
3. \(\frac{1}{max-min}\)
4. \(\frac{1}{\sigma\sqrt{2\pi}}\exp\left(-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^{2}\right)\)

1 2 3 4

Fill in the blanks in the sentence below to describe what happens on each repeat in code B above, in which we constructed the sampling distribution.

Draw data points replacement, compute the

Which of the following could have been used to fill in the blank in code block C above? Choose all that apply.

type = 'percentile', level = 0.68 type = 'percentile', level = 271 type = 'se', point_estimate = 271 type = 'se', point_estimate = 0.68

2 Hypothesis testing

Johnson and Newport (1989) began with the following question:

“The primary question of this study involved examining the relationship between age of learning English as a second language and performance on the test of English grammar…The first comparison involved determining whether there was any difference between the age 3-7 group and the native group in their performance in English.” ~ p78

We replicated this below using our simulated data and the 3-step hypothesis testing framework we learned in lecture:

# A. visualize difference with a boxplot
johnson_newport_1989 %>%
    filter(ageGroup %in% c("0", "3-7")) %>%
    ggplot(aes(y = score, x = ageGroup)) +
    geom_boxplot()

# B. compute observed difference in means
diff_means <- johnson_newport_1989 %>%
    filter(ageGroup %in% c("0", "3-7")) %>%
    specify(response = score, explanatory = ageGroup) %>%
    calculate(stat = "diff in means", order = c("0", "3-7"))

# C. construct the null distribution with infer
null_distribution <- johnson_newport_1989 %>%
    filter(ageGroup %in% c("0", "3-7")) %>%
    specify(response = score, explanatory = ageGroup) %>%
    hypothesize( null = "independence") %>%
    generate(reps = 1000, type = "permute") %>%
    calculate(stat = "diff in means", order = c("0", "3-7"))

# D. visualize the null and shade p-value
null_distribution %>%
    visualize()  +
    shade_p_value(obs_stat = diff_means, direction = "both" )

Step 1 is to pose the null hypothesis. True or false, the null hypothesis here is that the observed difference in means is due to nothing but chance.

TRUE FALSE

Step 2 is to ask, if the null hypothesis is true, how likely is our observed pattern of results? We quantify this likelihood with:

diff in means p-value correlation response

Step 3 is to decide whether to reject the null hypothesis. Johnson and Newport concluded that the two groups were not significantly different from each other, suggesting that participants who arrived to the US by age 7 acheived native proficiency. This implies that they
Given our simulated null distribution, do you agree with their decision? Explain why based on the simulation.

Answer

"I agree with their decision. Under the null hypothesis, the observed
 difference in means is just under -2. Given the shaded p-value, we 
 can see that observing this difference is fairly likely under the 
 null hypothesis. So we should fail to reject the null hypothsis."

Practice Quiz 2

The data

1 Sampling distribution

2 Hypothesis testing

3 Correlation

4 Model specification