Skip to content

Commit

Permalink
line break
Browse files Browse the repository at this point in the history
  • Loading branch information
jasonfan1997 committed Nov 1, 2024
1 parent 12adf62 commit 9d564fa
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ bibliography: paper.bib

# Statement of need
Classification is one of the most common applications in machine learning. Classification models are often evaluated by a proper scoring rule - a scoring function that assigns the best score when predicted probabilities match the true probabilities - such as cross-entropy or mean square error [@gneiting2007strictly]. Examination of the discrimination performance (resolution), such as AUC or Se/Sp are also used to evaluate the model performance. However, the reliability or calibration performance of the model is often overlooked.

@DIAMOND199285 had shown that the resolution performance of a model does not indicate the reliability of the model. @Brocker_decompose later has shown that any proper scoring rule can be decomposed into the resolution and reliability. Thus even if the model has high resolution (high AUC), it may not be a reliable or calibrated model. In many high-risk machine learning applications, such as medical diagnosis, the reliability of the model is of paramount importance.

We define calibration as the agreement between the predicted probability and the true posterior probability of a class-of-interest, $P(D=1|\hat{p}=p) = p$. This has been defined as moderate calibration by @Calster_weak_cal .
Expand All @@ -58,7 +59,7 @@ In the `calzone` package, we provide a set of functions and classes for calibrat

## Reliability Diagram

The reliability diagram (also referred to as a calibration plot) is a graphical representation of the calibration of a classification model [@Brocker_reldia;steyerberg2010assessing]. It groups the predicted probabilities into bins and plots the mean predicted probability against the empirical frequency in each bin. The reliability diagram can be used to assess the calibration of the model and to identify any systematic errors in the predictions. In addition, `calzone` gives the option to also plot the confidence interval of the empirical frequency in each bin. The confidence intervals are calculated using Wilson's score interval [@wilson_interval]. We provide example data in the `example_data` folder which are simulated using a beta-binomial distribution [@beta-binomial]. The predicted probabilities are sampled from a beta distribution and the true labels are assigned using a Bernoulli trial with the sampled probabilities. Users can generate simulated data using the `fake_binary_data_generator` class in the `utils` module.
The reliability diagram (also referred to as a calibration plot) is a graphical representation of the calibration of a classification model [@Brocker_reldia;steyerberg2010assessing]. It groups the predicted probabilities into bins and plots the mean predicted probability against the empirical frequency in each bin. The reliability diagram can be used to assess the calibration of the model and to identify any systematic errors in the predictions. In addition, `calzone` gives the option to also plot the confidence interval of the empirical frequency in each bin. The confidence intervals are calculated using Wilson's score interval [@wilson_interval]. We provide example data in the `example_data` folder which are simulated using a beta-binomial distribution [@beta-binomial]. The predicted probabilities are sampled from a beta distribution and the true labels are assigned by performing Bernoulli trials with the sampled probabilities. Users can generate simulated data using the `fake_binary_data_generator` class in the `utils` module.

```python
from calzone.utils import reliability_diagram
Expand Down

0 comments on commit 9d564fa

Please sign in to comment.