README.Rmd

---
output: github_document
---


```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-"
)
```

# dabestr

[![minimal R version](https://img.shields.io/badge/R%3E%3D-3.5.0-6666ff.svg)](https://cran.r-project.org/) [![CRAN Status Badge](https://www.r-pkg.org/badges/version-last-release/dabestr?color=orange)](https://cran.r-project.org/package=dabestr) [![Travis CI build status](https://img.shields.io/travis/com/ACCLAB/dabestr/master.svg)](https://travis-ci.com/ACCLAB/dabestr/) [![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)

dabestr is a package for **D**ata **A**nalysis using **B**ootstrap-Coupled **EST**imation.

## About

[Estimation statistics](https://en.wikipedia.org/wiki/Estimation_statistics "Estimation Stats on Wikipedia") is a [simple framework](https://thenewstatistics.com/itns/ "Introduction to the New Statistics") that avoids the [pitfalls](https://www.nature.com/articles/nmeth.3288 "The fickle P value generates irreproducible results, Halsey et al 2015") of significance testing. It uses familiar statistical concepts: means, mean differences, and error bars. More importantly, it focuses on the effect size of one's experiment/intervention, as opposed to a false dichotomy engendered by _P_ values.

An estimation plot has two key features.

1. It **presents all datapoints** as a swarmplot, which orders each point to display the underlying distribution.

2. It presents the **effect size** as a **bootstrap 95% confidence interval** on a **separate but aligned axes**.

```{r create.dummy.data, warning=FALSE, message=FALSE, echo=FALSE}

library(dplyr)
library(dabestr)


set.seed(54321)

N = 40
c1 <- rnorm(N, mean = 100, sd = 25)
c2 <- rnorm(N, mean = 100, sd = 50)
g1 <- rnorm(N, mean = 120, sd = 25)
g2 <- rnorm(N, mean = 80, sd = 50)
g3 <- rnorm(N, mean = 100, sd = 12)
g4 <- rnorm(N, mean = 100, sd = 50)
gender <- c(rep('Male', N/2), rep('Female', N/2))
id <- 1: N


wide.data <- 
  tibble::tibble(
    Control1 = c1, Control2 = c2,
    Group1 = g1, Group2 = g2, Group3 = g3, Group4 = g4,
    Gender = gender, ID = id)


my.data   <- 
  wide.data %>%
  tidyr::gather(key = Group, value = Measurement, -ID, -Gender)

```


```{r gardner.altman.showpieces, warning=FALSE, message=FALSE, echo=FALSE, fig.width = 6, fig.height = 3}

custom.theme <- 
  ggplot2::theme_classic() +
  ggplot2::theme(text=ggplot2::element_text(family = "Work Sans"))


two.group.unpaired <- 
  my.data %>%
  dabest(Group, Measurement, 
         # The idx below passes "Control" as the control group, 
         # and "Group1" as the test group. The mean difference
         # will be computed as mean(Group1) - mean(Control1).
         idx = c("Control1", "Group1"), 
         paired = FALSE)

two.group.paired <- 
  my.data %>%
  dabest(Group, Measurement, 
         idx = c("Control1", "Group2"), 
         paired = TRUE, id.col = ID)


plot(two.group.unpaired, theme = custom.theme, color.column = Gender)

plot(two.group.paired, theme = custom.theme, color.column = Gender)

``` 


```{r cumming.showpieces, warning=FALSE, message=FALSE, echo=FALSE, fig.width = 9, fig.height = 5}

multi.two.group.unpaired <- 
  my.data %>%
  dabest(Group, Measurement, 
         idx = list(c("Control1", "Group1"), 
                    c("Control2", "Group2"),
                    c("Group3", "Group4")),
         paired = FALSE
  )

plot(multi.two.group.unpaired, theme = custom.theme,
     color.column = Gender)


multi.two.group.paired <- 
  my.data %>%
  dabest(Group, Measurement, 
         idx = list(c("Control1", "Group1"), 
                    c("Control2", "Group2"),
                    c("Group3", "Group4")),
         paired = TRUE, id.col = ID
  )

plot(multi.two.group.paired, theme = custom.theme,
     color.column = Gender)


shared.control <- 
  my.data %>%
  dabest(Group, Measurement, 
         idx = c("Control1", "Group1", "Group2", "Group3", "Group4"),
         paired = FALSE
  )

plot(shared.control, theme = custom.theme,color.column = Gender)

```


## Installation

Your version of R must be 3.5.0 or higher.

```{r demo.install, eval=FALSE}
install.packages("dabestr")

# To install the latest development version on Github,
# use the line below.
devtools::install_github("ACCLAB/dabestr")
```

## Usage

```{r usage}
library(dabestr)

# Performing unpaired (two independent groups) analysis.
unpaired_mean_diff <- dabest(iris, Species, Petal.Width,
                             idx = c("setosa", "versicolor", "virginica"),
                             paired = FALSE)

# Display the results in a user-friendly format.
unpaired_mean_diff

# Produce a Cumming estimation plot.
plot(unpaired_mean_diff)
```

## How to Cite

[**Moving beyond P values: Everyday data analysis with estimation plots**](https://doi.org/10.1101/377978 "Our BioRxiv preprint")

Joses Ho, Tayfun Tumkaya, Sameer Aryal, Hyungwon Choi, Adam Claridge-Chang

## dabest In Other Languages

dabestr is also available in [Python](https://github.com/ACCLAB/DABEST-python "DABEST-Python on Github") and [Matlab](https://github.com/ACCLAB/DABEST-Matlab "DABEST-Matlab on Github").

## Bugs

Please open a [new issue](https://github.com/ACCLAB/dabestr/issues/new). Include a reproducible example (aka [reprex](https://www.tidyverse.org/help/)) so anyone can copy-paste your code and move quickly towards helping you out!

## Contributing

All contributions are welcome. Please fork this Github repo and open a pull request.