-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
823d761
commit 3ddeeac
Showing
3 changed files
with
106 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"hash": "b80d99b897693d5926839e3b73476605", | ||
"result": { | ||
"engine": "knitr", | ||
"markdown": "# Model Performace Difference\n\n## One-Sample Proportion Test for Machine Learning Research\n\n### Scenario:\nSuppose you are evaluating the performance of a new machine learning classifier that predicts whether patients have a particular disease. Previous research using similar classifiers has shown an AUC (Area Under the Curve) of 0.90. You want to see if your new classifier performs significantly better or worse than this standard. To do so, you will use a one-sample proportion test to compare your classifier's AUC against the hypothesized value of 0.90.\n\n### Setting Up the One-Sample Proportion Test\nIn this context:\n\n- **Null Hypothesis ($H_0$)**: The AUC of the new classifier is equal to 0.90. \n$$\nH_0: p = 0.90\n$$\n\n- **Alternative Hypothesis ($H_a$)**: The AUC of the new classifier is not equal to 0.90 (it could be higher or lower). \n$$\nH_a: p \\neq 0.90\n$$\n\n- **Significance Level ($\\alpha$)**: 0.05 (5% chance of Type I error — rejecting$H_0$when it's true).\n \n- **Power ($1 - \\beta$)**: 0.80 (80% chance of correctly rejecting$H_0$when it’s false).\n\n- **Effect Size ($\\Delta$)**: You want to detect a difference of at least 0.05, meaning that you want to see if the new AUC is 0.95 or greater, or 0.85 or lower.\n\n- **Observed Proportion ($p$)**: This is the proportion you will calculate based on the model’s performance on a test set (e.g., by using a ROC curve to compute the AUC).\n\n\n### Conducting the One-Sample Proportion Test\nAfter determining the sample size, you can conduct the experiment and calculate the observed AUC of your classifier on the test set.\n\nFor example:\n\n- **Test Set Size**: 150 cases\n- **Observed AUC**: 0.92\n\nYou can then use the one-sample proportion test to check if this observed AUC of 0.92 is significantly different from 0.90.\n\n### Performing the Test in R (`pwr`)\n\nHere's how you could perform this test in R using the `pwr.p.test()` function:\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(pwr)\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\n# Define parameters\np0 <- 0.90 # Null hypothesis proportion\npa <- 0.95 # Alternative hypothesis proportion\neffect_size <- abs(pa - p0) # Effect size (absolute difference)\npower <- 0.80 # Desired power\nalpha <- 0.05 # Significance level\n\n# Calculate the sample size using pwr.p.test()\nsample_size_result <- pwr.p.test(h = ES.h(p0, pa), \n sig.level = alpha, \n power = power, \n alternative = \"two.sided\")\n\n# Print the result\nprint(sample_size_result)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n\n proportion power calculation for binomial distribution (arcsine transformation) \n\n h = 0.1924743\n n = 211.8659\n sig.level = 0.05\n power = 0.8\n alternative = two.sided\n```\n\n\n:::\n:::\n\n\n\n\n\n**Explanation of the parameters** in `pwr.p.test()`:\n\n- `h = ES.h(p0, pa)`: The effect size for proportion tests calculated using Cohen's h formula.\n- `sig.level`: The significance level (alpha).\n- `power`: The desired power of the test.\n- `alternative`: Specifies whether the test is \"two.sided\", \"greater\", or \"less\".\n- `n`: the calculated sample size required **per group** to detect the specified effect size with the given power and significance level.\n\n\n\n\n### Use Case Summary\n\n| Parameter | Value |\n|------------------------------|----------------------------------|\n| Hypothesized AUC ($p_0$) | 0.90 |\n| Observed AUC | 0.92 |\n| Significance Level ($\\alpha$) | 0.05 |\n| Power ($1 - \\beta$) | 0.80 |\n| Test Set Size | 150 |\n| Null Hypothesis ($H_0$) | AUC = 0.90 |\n| Alternative Hypothesis ($H_a$) | AUC ≠ 0.90 |\n\nIf the one-sample proportion test shows a statistically significant result, you can confidently say that the AUC of the new classifier is different from 0.90 and assess whether the new model performs better or worse than the previous standard.", | ||
"supporting": [], | ||
"filters": [ | ||
"rmarkdown/pagebreak.lua" | ||
], | ||
"includes": {}, | ||
"engineDependencies": {}, | ||
"preserve": {}, | ||
"postProcess": true | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
# Model Performace Difference | ||
|
||
## One-Sample Proportion Test for Machine Learning Research | ||
|
||
### Scenario: | ||
Suppose you are evaluating the performance of a new machine learning classifier that predicts whether patients have a particular disease. Previous research using similar classifiers has shown an AUC (Area Under the Curve) of 0.90. You want to see if your new classifier performs significantly better or worse than this standard. To do so, you will use a one-sample proportion test to compare your classifier's AUC against the hypothesized value of 0.90. | ||
|
||
### Setting Up the One-Sample Proportion Test | ||
In this context: | ||
|
||
- **Null Hypothesis ($H_0$)**: The AUC of the new classifier is equal to 0.90. | ||
$$ | ||
H_0: p = 0.90 | ||
$$ | ||
|
||
- **Alternative Hypothesis ($H_a$)**: The AUC of the new classifier is not equal to 0.90 (it could be higher or lower). | ||
$$ | ||
H_a: p \neq 0.90 | ||
$$ | ||
|
||
- **Significance Level ($\alpha$)**: 0.05 (5% chance of Type I error — rejecting$H_0$when it's true). | ||
|
||
- **Power ($1 - \beta$)**: 0.80 (80% chance of correctly rejecting$H_0$when it’s false). | ||
|
||
- **Effect Size ($\Delta$)**: You want to detect a difference of at least 0.05, meaning that you want to see if the new AUC is 0.95 or greater, or 0.85 or lower. | ||
|
||
- **Observed Proportion ($p$)**: This is the proportion you will calculate based on the model’s performance on a test set (e.g., by using a ROC curve to compute the AUC). | ||
|
||
|
||
### Conducting the One-Sample Proportion Test | ||
After determining the sample size, you can conduct the experiment and calculate the observed AUC of your classifier on the test set. | ||
|
||
For example: | ||
|
||
- **Test Set Size**: 150 cases | ||
- **Observed AUC**: 0.92 | ||
|
||
You can then use the one-sample proportion test to check if this observed AUC of 0.92 is significantly different from 0.90. | ||
|
||
### Performing the Test in R (`pwr`) | ||
|
||
Here's how you could perform this test in R using the `pwr.p.test()` function: | ||
|
||
```{r} | ||
library(pwr) | ||
``` | ||
|
||
```{r} | ||
# Define parameters | ||
p0 <- 0.90 # Null hypothesis proportion | ||
pa <- 0.95 # Alternative hypothesis proportion | ||
effect_size <- abs(pa - p0) # Effect size (absolute difference) | ||
power <- 0.80 # Desired power | ||
alpha <- 0.05 # Significance level | ||
# Calculate the sample size using pwr.p.test() | ||
sample_size_result <- pwr.p.test(h = ES.h(p0, pa), | ||
sig.level = alpha, | ||
power = power, | ||
alternative = "two.sided") | ||
# Print the result | ||
print(sample_size_result) | ||
``` | ||
|
||
|
||
**Explanation of the parameters** in `pwr.p.test()`: | ||
|
||
- `h = ES.h(p0, pa)`: The effect size for proportion tests calculated using Cohen's h formula. | ||
- `sig.level`: The significance level (alpha). | ||
- `power`: The desired power of the test. | ||
- `alternative`: Specifies whether the test is "two.sided", "greater", or "less". | ||
- `n`: the calculated sample size required **per group** to detect the specified effect size with the given power and significance level. | ||
|
||
|
||
|
||
|
||
### Use Case Summary | ||
|
||
| Parameter | Value | | ||
|------------------------------|----------------------------------| | ||
| Hypothesized AUC ($p_0$) | 0.90 | | ||
| Observed AUC | 0.92 | | ||
| Significance Level ($\alpha$) | 0.05 | | ||
| Power ($1 - \beta$) | 0.80 | | ||
| Test Set Size | 150 | | ||
| Null Hypothesis ($H_0$) | AUC = 0.90 | | ||
| Alternative Hypothesis ($H_a$) | AUC ≠ 0.90 | | ||
|
||
If the one-sample proportion test shows a statistically significant result, you can confidently say that the AUC of the new classifier is different from 0.90 and assess whether the new model performs better or worse than the previous standard. |