README.Rmd

---
title: "SeqSGPV Package"
output: github_document
bibliography: refs.bib
csl: apa.csl
editor_options: 
  markdown: 
    wrap: 72
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, comment=NA, message=FALSE, warning=FALSE, fig.width = 8, fig.height = 8, fig.align='center', eval=TRUE)
```

```{r, cache=FALSE, echo=FALSE, results='hide'}
library(SeqSGPV)
```

# Purpose

The SeqSGPV package is used to design a study with sequential monitoring
of scientifically meaningful hypotheses using the second generation
p-value (SGPV).

It supports the paper "Sequential monitoring using the Second Generation P-Value with Type I error controlled by monitoring
frequency" which advances how to:

1.  Specify scientifically meaningful hypotheses using a constrained
    Region of Equivalence [@Freedman:1984wz]. The constrained set of
    hypotheses are called Pre-Specified Regions Indicating Scientific
    Merit (PRISM).
2.  Sequentially monitor the SGPV (SeqSGPV) for scientifically
    meaningful hypotheses.
3.  Control error rates uses monitoring frequency mechanisms, including
    an affirmation step.

# Why PRISM

Monitoring until establishing statistical significance does not provide
information on whether an effect is scientifically meaningful.
Establishing scientific relevance requires pre-specifying which effects
are scientifically meaningful and an end-study inference that evaluates
these effects.

For intervention studies, @Freedman:1984wz categorizes effects as being: universally acceptable for adopting the intervention; universally unacceptable compared to standard of care; or scientifically ambiguous for whether the intervention should be adopted over the standard of care. The latter set of scientifically ambiguous effects form the ROE. 

Strategies for specifying ROE include setting the point null as a ROE boundary [@Hobbs:2008ce; Section 2.2], setting the ROE away from the point null [@Freedman:1984wz; Figure 1], and surrounding the point null [@Kruschke:2013jy]. @Kruschke:2013jy calls the latter strategy a Region of Practical Equivalence (ROPE); see footnote[^1] for clarification between the ROPE and ROE.

[^1]: @Kruschke:2013jy uses 'equivalence' to refer to effects indifferent to the point null whereas @Freedman:1984wz uses 'equivalence' to refer to effects in which there is ambiguity on whether an effect is clearly superior to standard of care. Hence, ROE is a broader term. A ROPE encompasses the point null and could be considered a ROE; whereas, there are no constraints on ROE.

The PRISM constrains ROE to be set away from the point null (similar in spirit to [@Freedman:1984wz; Figure 1]) but with a more explicit constraint. The PRISM divides the parameter space into three exhaustive, non-empty, and mutually exclusive regions:

1. The ROPE as defined by @Kruschke:2013jy to include effects practically equivalent to the point null.
2. The ROE as defined by [@Freedman:1984wz; Figure 1] to include scientifically ambiguous effects.
3. The Region of Meaningful Effects (ROME) to include effects that are scientifically meaningful.

In a 1-sided hypothesis, the ROPE is replaced by a ROWPE denoting a Region of Worse or Practically Equivalent effects.

![Pre-Specified Regions Indicating Scientific Merit (PRISM) for one- and two-sided hypotheses. The PRISM always includes an indifference zone that surrounds the point null hypothesis (i.e.
ROPE/ROWPE).](images/AMwithSGPV_Figsv09_1.jpg)

In the context of interval monitoring, error rates and sample size are
impacted by ROE specification. 

Compared to ROPE monitoring, PRISM monitoring also reduces the risk of type I error yet resolves the issue of indefinite monitoring at ROPE boundaries. 

Compared to null-bound ROE monitoring, PRISM monitoring reduces the risk of Type I error for the same monitoring frequency and allows for earlier monitoring and yields smaller average sample size to achieve the same Type I error.

# Why SGPV

The SGPV is an evidence-based metric that measures the overlap between
an inferential interval and scientifically meaningful hypotheses.
Described as 'method agnostic' [@stewart2019second], the SGPV may be
calculated for any inferential interval (ex: bayesian, frequentist,
likelihood, etc.). For an interval hypothesis, $`H`$, the SGPV is denoted as p$`_H`$ and calculated as:

The SGPV [@Blume:SGPV] is an evidence-based metric that quantifies the
overlap between an interval $`I`$ and the set of effects in a composite
hypothesis *H*. The interval includes [*a*, *b*] where *a* and *b* are
real numbers such that *a*\<*b*, and the length of the interval is *b-a* and denoted $`I`$. The overlap between the interval and the set *H* is $`\left| I \cap H \right|$. The SGPV is then calculated as


```{math}
p_{H} = \frac{| I \cap H |}{|I|} \times \max \left\{\frac{| I |}{2 | H |}, 1 \right\}.
```

\noindent The adjustment, $`max\left\{\frac{|I|}{2|H|},1\right\}`$, is a small sample size correction -- setting $`p_H`$ to half of the overlap when the inferential interval overwhelms $`H`$ by at least twice the length. 


In "Sequential monitoring using the Second Generation P-Value with Type
I error controlled by monitoring frequency", foundational likelihood, frequentist, and Bayesian metrics are compared and contrasted in terms of minimal assumptions, handling of composite hypotheses, and conclusions that can be drawn. The SGPV makes no further assumptions beyond those that may be inherited by the inferential interval. It does not require a likelihood, prior, study design, or error rates.

See examples for interpreting possible end of study conclusions using
the SGPV.

# Why change monitoring frequency

Controlling the design-based Type I error is recommended for trials to receive regulatory approval @us2010guidance. In SeqSGPV, error rates can be controlled through PRISM specification and/or monitoring frequency.[^2]

[^2]: @jennison1989interim provide error rate control for intervals that adjust for frequency properties using group sequential methods. For intervals which do not adjust for frequency properties, Type I error can be controlled through a combination of the PRISM's ROPE and monitoring frequency mechanisms of wait time (W), frequency of evaluation (S for steps between evaluations), maximum sample size (N), and affirmation steps (A).

Since scientific relevance (i.e., PRISM) is considered fixed, monitoring frequency is a targettable means for controlling error rates. These include a wait time until evaluating stopping rules, the frequency of evaluations, a maximum sample size, and an affirmation rule. The affirmation rule is used in dose-escalation trials once a number of patients have consecutively been enrolled at the recommended maximum tolerable dose. We use it here to further control error rates and frequency properties after setting a practial wait time, monitoring frequency, and maximum sample size.

**Synergy between 1-sided PRISM and monitoring frequency**: On their
own, both the PRISM and monitoring frequency strategy help reduce the risk of Type I error. When used together, the 1-sided PRISM and monitoring frequency can notably reduce the average sample size to achieve a Type I error. When outcomes are delayed, the risk of reversing a decision on the null hypothesis decreases when monitoring a 1-sided PRISM more so than under a 1-sided null-bound ROE. Additional strategies, such as posterior predictive probabilities could be considered to further inform decisions under delayed outcomes.

# Package overview

Trial designs are evaluated through simulation via the SeqSGPV function
until achieving desirable operating characteristics such as error rates,
sample size, bias, and coverage. The SeqSGPV function can be used to
assess departures from modelling assumptions.

Outcomes may be generated from any r[dist] distribution, a user-supplied
data generation function, or pre-existing data. Study designs of
bernoulli and normally distributed outcomes have been more extensively
evaluated and extra care should be provided when designing a study with
outcomes of other distribution families.

The user provides a function for obtaining interval of interest. Some
functions have been built for common interval estimations: binomial
credible and confidence intervals using binom::binom.confint, wald
confidence intervals using lm function for normal outcomes, and wald
confidence intervals using glm function with binomial link for bernoulli
outcomes.

Depending on computing environment, simulations may be time consuming to obtain many (10s of thousands) replicates and more so for bernoulli
outcomes. The user may consider starting with a small number of
replicates (200 - 2000) to get a sense of design operating
characteristics. Sample size estimates of a single look trial may also
inform design parameters.

# Study design examples

Study designs and interpretations of a single trial are provided below for 2 arm trials with bernoulli or normally distributed outcomes.

-   [two arm trial, continuous
    outcomes](examples/two-arm-continuous/README.md)
-   [two arm trial, bernoulli
    outcomes](examples/two-arm-bernoulli/README.md)

# References