Update documentation

yongtamu · Jun 18, 2021 · e105d0c · e105d0c
1 parent 1a51291
commit e105d0c
Show file tree

Hide file tree

Showing 19 changed files with 81 additions and 61 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,8 +1,8 @@
 inst/
-Meta
 .Rproj.user
 .Rhistory
 .RData
 *.el
 notes.org
 doc
+/Meta/
diff --git a/R/FRD_lp.R b/R/FRD_lp.R
@@ -128,10 +128,10 @@ FRDHonest <- function(formula, data, subset, weights, cutoff=0, M,
 #'     class \code{"RDBW"} is a list containing the following components:
 #'
 #'     \describe{
-#'     \item{\code{hp}}{bandwidth for observations above cutoff}
+#'     \item{\code{hp}}{bandwidth for observations weakly above cutoff}
 #'
-#'     \item{\code{hm}}{bandwidth for observations below cutoff, equal to
-#'     \code{hp} unless \code{bw.equal==FALSE}}
+#'     \item{\code{hm}}{bandwidth for observations strictly below cutoff, equal
+#'     to \code{hp} unless \code{bw.equal==FALSE}}
 #'
 #'     \item{\code{sigma2m}, \code{sigma2p}}{estimate of conditional variance
 #'     just above and just below cutoff, \eqn{\sigma^2_+(0)} and

diff --git a/R/RD_lp.R b/R/RD_lp.R
@@ -135,9 +135,9 @@ RDHonest <- function(formula, data, subset, weights, cutoff=0, M,
 #'     class \code{"RDBW"} is a list containing the following components:
 #'
 #'     \describe{
-#'     \item{\code{hp}}{bandwidth for observations above cutoff}
+#'     \item{\code{hp}}{bandwidth for observations strictly above cutoff}
 #'
-#'     \item{\code{hm}}{bandwidth for observations below cutoff, equal to
+#'     \item{\code{hm}}{bandwidth for observations weakly below cutoff, equal to
 #'     \code{hp} unless \code{bw.equal==FALSE}}
 #'
 #'     \item{\code{sigma2m}, \code{sigma2p}}{estimate of conditional variance

diff --git a/doc/RDHonest.R b/doc/RDHonest.R
@@ -78,7 +78,7 @@ RDHonest(voteshare ~ margin, data=lee08, kern="uniform", M=M, sclass="H", opt.cr
 
 
 ## -----------------------------------------------------------------------------
-## Add variance estimate to the lee data so that the RDSmoothnessBound
+## Add variance estimate to the Lee (2008) data so that the RDSmoothnessBound
 ## function doesn't have to compute them each time
 dl <- NPRPrelimVar.fit(dl, se.initial="nn")
 

diff --git a/doc/RDHonest.Rmd b/doc/RDHonest.Rmd
@@ -47,10 +47,10 @@ In the sharp regression discontinuity model, we observe units $i=1,\dotsc,n$,
 with the outcome $y_i$ for the $i$th unit given by $$ y_i = f(x_i) + u_i, $$
 where $f(x_i)$ is the expectation of $y_i$ conditional on the running variable
 $x_i$ and $u_i$ is the regression error. A unit is treated if and only if the
-running variable $x_{i}$ lies above a known cutoff $c_{0}$. The parameter of
-interest is given by the jump of $f$ at the cutoff, $$ \beta=\lim_{x\downarrow
-c_{0}}f(x)-\lim_{x\uparrow c_{0}}f(x).$$ Let $\sigma^2(x_i)$ denote the
-conditional variance of $u_i$.
+running variable $x_{i}$ lies weakly above a known cutoff $x_{i}\geq c_{0}$. The
+parameter of interest is given by the jump of $f$ at the cutoff, $$
+\beta=\lim_{x\downarrow c_{0}}f(x)-\lim_{x\uparrow c_{0}}f(x).$$ Let
+$\sigma^2(x_i)$ denote the conditional variance of $u_i$.
 
 In the @lee08 dataset, the running variable corresponds to the margin of victory of
 a Democratic candidate in a US House election, and the treatment corresponds to
@@ -63,7 +63,8 @@ occurred in 1947. The running variable is the year in which the individual turne
 14, with the cutoff equal to 1947 so that the "treatment" is being subject to a
 higher minimum school-leaving age. The outcome is log earnings in 1998.
 
-Some of the functions in the package require the data to be transformed into a custom `RDData` format. This can be accomplished with the `RDData` function:
+Some of the functions in the package require the data to be transformed into a
+custom `RDData` format. This can be accomplished with the `RDData` function:
 
 ```{r}
 library("RDHonest")
@@ -241,6 +242,10 @@ variable is discrete, with $G$ support points: their construction makes no
 assumptions on the nature of the running variable (see Section 5.1 in @KoRo16
 for more detailed discussion).
 
+Note that units that lies exactly at the cutoff are considered treated, since
+the definition of treatment is that the running variable
+ $x_i\geq c_0$.
+
 As an example, consider the @oreopoulos06 data, in which the running variable is age in years:
 ```{r}
 ## Replicate Table 2, column (10)
@@ -393,7 +398,7 @@ The package also implements lower-bound estimates for the smoothness constant
 $M$ for the Taylor and Hölder smoothness class, as described in the supplements to @KoRo16 and @ArKo16optimal
 
 ```{r}
-## Add variance estimate to the lee data so that the RDSmoothnessBound
+## Add variance estimate to the Lee (2008) data so that the RDSmoothnessBound
 ## function doesn't have to compute them each time
 dl <- NPRPrelimVar.fit(dl, se.initial="nn")
 
@@ -443,8 +448,9 @@ different, but the worst-case bias and the point estimate are identical.
 
 ## Model
 
-In a fuzzy RD design, the treatment $d_{i}$ is not entirely determined by
-whether the running variable $x_{i}$ exceeds a cutoff. Instead, the cutoff
+In a fuzzy RD design, units are assigned to treatment if their running variable
+$x_{i}$ weakly exceeds a cutoff $x_i\geq c_{0}$. However, the actual treatment
+$d_{i}$ does not perfectly comply with the treatment assignment. Instead, the cutoff
 induces a jump in the treatment probability. The resulting reduced-form and
 first-stage regressions are given by
 \begin{align*}
@@ -454,8 +460,12 @@ See Section 3.3 in @ArKo16honest for a more detailed description.
 
 In the @battistin09 dataset, the treatment variable is an indicator for
 retirement, and the running variable is number of years since being eligible to
-retire. The cutoff is $0$. (individuals exactly at the cutoff are dropped).
-Similarly to the `RDData` function, the `FRDData` function transforms the data into an appropriate format:
+retire. The cutoff is $0$. Individuals exactly at the cutoff are dropped from
+the dataset. If there were individuals exactly at the cutoff, they are assumed
+to be assigned to the treatment group.
+
+Similarly to the `RDData` function, the `FRDData` function transforms the data
+into an appropriate format:
 
 ```{r}
 ## Assumes first column in the data frame corresponds to outcome,

diff --git a/doc/RDHonest.pdf b/doc/RDHonest.pdf
diff --git a/doc/lpkernels.pdf b/doc/lpkernels.pdf
diff --git a/doc/manual.pdf b/doc/manual.pdf
diff --git a/man-roxygen/RDBW.R b/man-roxygen/RDBW.R
@@ -1,6 +1,6 @@
 #' @param h bandwidth, a scalar parameter. For fuzzy or sharp RD, it can be a
 #'     named vector of length two with names \code{"p"} and \code{"m"}, in which
-#'     case the bandwidth \code{h["m"]} is used for observations below the
-#'     cutoff, and the bandwidth \code{h["p"]} is used for observations above
-#'     the cutoff. If not supplied, optimal bandwidth is computed according to
-#'     criterion given by \code{opt.criterion}.
+#'     case the bandwidth \code{h["m"]} is used for observations strictly below
+#'     the cutoff, and the bandwidth \code{h["p"]} is used for observations
+#'     weakly above the cutoff. If not supplied, optimal bandwidth is computed
+#'     according to criterion given by \code{opt.criterion}.
diff --git a/man/FRDHonest.Rd b/man/FRDHonest.Rd
diff --git a/man/FRDOptBW.Rd b/man/FRDOptBW.Rd
diff --git a/man/LPPHonest.Rd b/man/LPPHonest.Rd
diff --git a/man/NPRHonest.fit.Rd b/man/NPRHonest.fit.Rd
diff --git a/man/NPRreg.fit.Rd b/man/NPRreg.fit.Rd
diff --git a/man/RDHonest.Rd b/man/RDHonest.Rd
diff --git a/man/RDHonestBME.Rd b/man/RDHonestBME.Rd
diff --git a/man/RDOptBW.Rd b/man/RDOptBW.Rd
diff --git a/tests/testthat/test_rd.R b/tests/testthat/test_rd.R
@@ -104,8 +104,8 @@ test_that("Honest inference in Lee and LM data",  {
     expect_equal(r$maxbias, ff(r$hp, "uniform", "supplied.var")$maxbias)
 
     r <- es("triangular", "nn")
-    expect_equal(r$hm, 22.80882408)
-    expect_equal(unname(r$estimate+r$hl), 0.05476609)
+    expect_lt(abs(r$hm- 22.80882408), 5e-7)
+    expect_lt(unname(r$estimate+r$hl- 0.05476609), 1e-7)
     ## End replication
 
     ## Replicate 1511.06028v2

diff --git a/vignettes/RDHonest.Rmd b/vignettes/RDHonest.Rmd
@@ -47,10 +47,10 @@ In the sharp regression discontinuity model, we observe units $i=1,\dotsc,n$,
 with the outcome $y_i$ for the $i$th unit given by $$ y_i = f(x_i) + u_i, $$
 where $f(x_i)$ is the expectation of $y_i$ conditional on the running variable
 $x_i$ and $u_i$ is the regression error. A unit is treated if and only if the
-running variable $x_{i}$ lies above a known cutoff $c_{0}$. The parameter of
-interest is given by the jump of $f$ at the cutoff, $$ \beta=\lim_{x\downarrow
-c_{0}}f(x)-\lim_{x\uparrow c_{0}}f(x).$$ Let $\sigma^2(x_i)$ denote the
-conditional variance of $u_i$.
+running variable $x_{i}$ lies weakly above a known cutoff $x_{i}\geq c_{0}$. The
+parameter of interest is given by the jump of $f$ at the cutoff, $$
+\beta=\lim_{x\downarrow c_{0}}f(x)-\lim_{x\uparrow c_{0}}f(x).$$ Let
+$\sigma^2(x_i)$ denote the conditional variance of $u_i$.
 
 In the @lee08 dataset, the running variable corresponds to the margin of victory of
 a Democratic candidate in a US House election, and the treatment corresponds to
@@ -63,7 +63,8 @@ occurred in 1947. The running variable is the year in which the individual turne
 14, with the cutoff equal to 1947 so that the "treatment" is being subject to a
 higher minimum school-leaving age. The outcome is log earnings in 1998.
 
-Some of the functions in the package require the data to be transformed into a custom `RDData` format. This can be accomplished with the `RDData` function:
+Some of the functions in the package require the data to be transformed into a
+custom `RDData` format. This can be accomplished with the `RDData` function:
 
 ```{r}
 library("RDHonest")
@@ -241,6 +242,10 @@ variable is discrete, with $G$ support points: their construction makes no
 assumptions on the nature of the running variable (see Section 5.1 in @KoRo16
 for more detailed discussion).
 
+Note that units that lies exactly at the cutoff are considered treated, since
+the definition of treatment is that the running variable
+ $x_i\geq c_0$.
+
 As an example, consider the @oreopoulos06 data, in which the running variable is age in years:
 ```{r}
 ## Replicate Table 2, column (10)
@@ -443,8 +448,9 @@ different, but the worst-case bias and the point estimate are identical.
 
 ## Model
 
-In a fuzzy RD design, the treatment $d_{i}$ is not entirely determined by
-whether the running variable $x_{i}$ exceeds a cutoff. Instead, the cutoff
+In a fuzzy RD design, units are assigned to treatment if their running variable
+$x_{i}$ weakly exceeds a cutoff $x_i\geq c_{0}$. However, the actual treatment
+$d_{i}$ does not perfectly comply with the treatment assignment. Instead, the cutoff
 induces a jump in the treatment probability. The resulting reduced-form and
 first-stage regressions are given by
 \begin{align*}
@@ -454,8 +460,12 @@ See Section 3.3 in @ArKo16honest for a more detailed description.
 
 In the @battistin09 dataset, the treatment variable is an indicator for
 retirement, and the running variable is number of years since being eligible to
-retire. The cutoff is $0$. (individuals exactly at the cutoff are dropped).
-Similarly to the `RDData` function, the `FRDData` function transforms the data into an appropriate format:
+retire. The cutoff is $0$. Individuals exactly at the cutoff are dropped from
+the dataset. If there were individuals exactly at the cutoff, they are assumed
+to be assigned to the treatment group.
+
+Similarly to the `RDData` function, the `FRDData` function transforms the data
+into an appropriate format:
 
 ```{r}
 ## Assumes first column in the data frame corresponds to outcome,