Skip to content

Commit

Permalink
Update 09-hypothesis-testing.Rmd
Browse files Browse the repository at this point in the history
  • Loading branch information
ismayc authored Feb 19, 2025
1 parent 8dea389 commit a367ec7
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions 09-hypothesis-testing.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -742,7 +742,7 @@ prop_metal_popular <- round(prop_metal_popular, 3)
prop_deephouse_popular <- round(prop_deephouse_popular, 3)
```

So in this one sample of a hypothetical universe of no difference in genre popularity, $`r n_metal_popular`/26 = `r prop_metal_popular` = `r prop_metal_popular*100`\%$ of metal songs were popular. On the other hand, $`r n_deephouse_popular`/26 = `r prop_deephouse_popular` = `r prop_deephouse_popular*100`\%$ of deep house songs were popular. Let's next compare these two values. It appears that metal tracks were popular at a rate that was $`r prop_metal_popular ` - `r prop_deephouse_popular ` = `r diff_prop` = `r diff_prop*100`\%$ different than deep house songs.
So in this one sample of a hypothetical universe of no difference in genre popularity, $`r n_metal_popular`/26 = `r prop_metal_popular` = `r prop_metal_popular*100`\%$ of metal songs were popular. On the other hand, $`r n_deephouse_popular`/26 = `r prop_deephouse_popular` = `r prop_deephouse_popular*100`\%$ of deep house songs were popular. Let's next compare these two values. It appears that metal tracks were popular at a rate that was $`r prop_metal_popular ` - `r prop_deephouse_popular ` = `r diff_prop` = `r diff_prop*100`\$ percentage points different than deep house songs.

Observe how this difference in rates is not the same as the difference in rates of `r observed_test_statistic` = `r observed_test_statistic*100`% we originally observed. This is once again due to *sampling variation*. How can we better understand the effect of this sampling variation? By repeating this shuffling several times!

Expand Down Expand Up @@ -788,9 +788,9 @@ sampling_scenarios |>

So, based on our sample of $n_m = 1000$ metal tracks and $n_f = 1000$ deep house tracks, the *point estimate* for $p_{m} - p_{d}$ is the *difference in sample proportions*

$$\widehat{p}_{m} -\widehat{p}_{f} = `r p_metal_popular` - `r p_deephouse_popular` = `r observed_test_statistic` = `r observed_test_statistic*100`\%$$.
$$\widehat{p}_{m} -\widehat{p}_{f} = `r p_metal_popular` - `r p_deephouse_popular` = `r observed_test_statistic`$$.

This difference in favor of metal songs of `r observed_test_statistic` is greater than 0, suggesting metal songs are more popular than deep house songs.
This difference in favor of metal songs of `r observed_test_statistic` (`r observed_test_statistic*100` percentage points) is greater than 0, suggesting metal songs are more popular than deep house songs.

However, the question we ask ourselves was "is this difference meaningfully greater than 0?". In other words, is that difference indicative of true popularity, or can we just attribute it to *sampling variation*? Hypothesis testing allows us to make such distinctions.

Expand Down

0 comments on commit a367ec7

Please sign in to comment.