R CMD check now works. Still get 3 warnings. Got rid of data.table. U…

…se just data.frame. updated a whole bunch of stuff.
hyejung0 · Mar 16, 2023 · aa336f5 · aa336f5
1 parent 982765e
commit aa336f5
Show file tree

Hide file tree

Showing 93 changed files with 3,809 additions and 349 deletions.
diff --git a/boots.lmer.Rcheck/00_pkg_src/boots.lmer/DESCRIPTION b/boots.lmer.Rcheck/00_pkg_src/boots.lmer/DESCRIPTION
@@ -12,8 +12,8 @@ Roxygen: list(markdown = TRUE)
 RoxygenNote: 7.2.3
 Suggests: knitr, rmarkdown
 VignetteBuilder: knitr
-Depends: data.table, lme4, rlang, parallel, stringr
+Depends: R (>= 3.5.0), data.table
 NeedsCompilation: no
-Packaged: 2023-03-14 22:14:56 UTC; u1317537
+Packaged: 2023-03-16 01:59:32 UTC; u1317537
 Author: Hyejung Lee [aut, cre] (<https://orcid.org/0000-0002-5293-4134>)
 Maintainer: Hyejung Lee <[email protected]>
diff --git a/boots.lmer.Rcheck/00_pkg_src/boots.lmer/R/boots.lmer.R b/boots.lmer.Rcheck/00_pkg_src/boots.lmer/R/boots.lmer.R
@@ -6,50 +6,62 @@
 #' @param y character string. Name of column in each data.table of boots.samples.list, to be used as the outcome in LMER.
 #' @param X a vector of character string.  Names of the of covariates to fit in LMER, as they appear on each dataset of boots.samples.list.
 #' @param use.formula a vector of class formula. This is to be used in lmer fit. If not provided, then a basic additive fixed effects model with random effect of (1|no.repeat.sub.id) will be fit.
+#' @param num_workers integer. A number of cores to use for parallel computing.
 #' @examples
 #' example.subject<-c("Sarah","John","Beth","Anna","Sarah","Sarah","Chris","Blake","John","Anna")
 #' example.dat<-data.frame("Y"=rnorm(n=length(example.subject)),
-#'                        "X"=rpois(n=length(example.subject), lambda = 3),
-#'                        "subjects"=example.subject)
+#'                         "X1"=rpois(n=length(example.subject), lambda = 3),
+#'                         "X2"=rnorm(n=length(example.subject)),
+#'                         "X3"=rbeta(n=length(example.subject), shape1 = 3, shape2 = 0.5),
+#'                         "subjects"=example.subject)
 #' output<-boots.samples(dat=example.dat,sub.id = "subjects",B=4) #create 4 bootstrap samples
 #' lmer.out<-boots.lmer(y="Y", X=c("X1","X2","X3"), boots.samples.list = output)
 #' @export
-boots.lmer<-function(y,X,boots.samples.list,use.formula=NULL){
-library(data.table)
+boots.lmer<-function(y,X,boots.samples.list,use.formula=NULL, num_workers=NULL){
+
+
 
   lmer.fit<-function(y,X,dat,use.formula=NULL){#a function that fits one LMER
 
 
-    if(!is.null(use.formula)){#If we have a specific formula we want to use for lmer, then we use it.
-      dat[,lme4::lmer(use.formula,data=dat, REML=FALSE)]
-    }else{ #if not, then we will fit a basic function with random effect with common intercept, etc.
-      #Make sure we use backtick quotes in case the names have spaces and special characters
+    if(is.null(use.formula)){#if we are not provided with specific formula, then create our own
       X.names<-lapply(X, as.name)
-      #If we have more than one covariate, concatenate them with +
       if(length(X)>1){
         X.names<-paste(X.names, collapse=" + ")
         X.formula<-as.formula(paste0("~ ", paste(X.names, collapse=" + ")))
       }
-      model_formula1<-formula(paste0("`",rlang::quo_name(y),"`", "~", rlang::quo_name(X.names),"+(1|no.repeat.sub.id)"))
-      dat[,lme4::lmer(model_formula1,data=dat, REML=FALSE)]
+
+      if(length(unique(dat$no.repeat.sub.id))==nrow(dat)){ #If no subjects are repeating, no need to fit LMER, use LM.
+        use.formula<-formula(paste0("`",rlang::quo_name(y),"`",rlang::quo_name(X.formula)))
+        lm(use.formula,data=dat)
+      }else{
+        use.formula<-formula(paste0("`",rlang::quo_name(y),"`",rlang::quo_name(X.formula),"+(1|no.repeat.sub.id)"))
+        lme4::lmer(use.formula,data=dat, REML=FALSE)
+      }
+    }else{ #If we are provided with formula....
+      lme4::lmer(use.formula,data=dat, REML=FALSE)
     }
 
   }
 
+  if(is.null(num_workers)){
+    #CRAN limits the number of cores available to packages to 2, for performance reasons. There was a thread in the mailing list, I believe, but I can't find it right now.
+    chk <- Sys.getenv("_R_CHECK_LIMIT_CORES_", "")
 
-  #CRAN limits the number of cores available to packages to 2, for performance reasons. There was a thread in the mailing list, I believe, but I can't find it right now.
-  chk <- Sys.getenv("_R_CHECK_LIMIT_CORES_", "")
+    if (nzchar(chk) && chk == "TRUE") {
+      # use 2 cores in CRAN/Travis/AppVeyor
+      num_workers <- 2L
+    } else {
+      # use all cores in devtools::test()
+      num_workers <- parallel::detectCores()
+    }
 
-  if (nzchar(chk) && chk == "TRUE") {
-    # use 2 cores in CRAN/Travis/AppVeyor
-    num_workers <- 2L
-  } else {
-    # use all cores in devtools::test()
-    num_workers <- parallel::detectCores()
   }
 
 
+
   parallel::mclapply(boots.samples.list,function(boots.dat){
+    #Convert to data table to save time
     lmer.fit(y=y,X=X,dat=boots.dat)}
     ,mc.cores = num_workers)
 

diff --git a/boots.lmer.Rcheck/00_pkg_src/boots.lmer/R/boots.samples.R b/boots.lmer.Rcheck/00_pkg_src/boots.lmer/R/boots.samples.R
@@ -41,18 +41,26 @@ boots.samples<-function(dat, sub.id,B){
   output<-
     apply(no.repeat.id,2,function(x){
       temp<-
-        sapply(x,function(x_i){ #for each subject
+        lapply(x,function(x_i){ #for each subject
           index<-which(sub.id==stringr::word(x_i,1,sep = "\\__")) #index of original data where the subject id appears
           cbind(index, "no.repeat.id"=rep(x_i,length(index)))}) #get all observation for this particular subject.
 
-      #bind the index and no.repeat.id. as rows
-      temp.dat<-do.call(rbind, lapply(temp, data.table::data.table))
-      complete.dat<-data.table::as.data.table(dat[temp.dat$index,])
-      # complete.dat[,no.repeat.sub.id:=(temp.dat$no.repeat.id)] # add the new subject ID column, with no repetition
-      complete.dat[,"no.repeat.sub.id"]<-temp.dat$no.repeat.id
+
+      temp.dat<-do.call(rbind, lapply(temp, data.frame))
+      complete.dat<-dat[temp.dat$index,]
+      complete.dat[,"no.repeat.sub.id"]<-temp.dat[,"no.repeat.id"]
       return(complete.dat)
-      }
+
+      # #bind the index and no.repeat.id. as rows
+      # temp.dat<-do.call(rbind, lapply(temp, data.table::data.table))
+      # complete.dat<-data.table::as.data.table(dat[temp.dat$index,])
+      # # complete.dat[,no.repeat.sub.id:=(temp.dat$no.repeat.id)] # add the new subject ID column, with no repetition
+      # # complete.dat$no.repeat.sub.id<-temp.dat$no.repeat.id
+      # complete.dat[,"no.repeat.sub.id"]<-temp.dat$no.repeat.id
+      # return(complete.dat)
+    }
     )
 
   output
 }
+
diff --git a/boots.lmer.Rcheck/00_pkg_src/boots.lmer/inst/doc/create_bootstrap_samples.R b/boots.lmer.Rcheck/00_pkg_src/boots.lmer/inst/doc/create_bootstrap_samples.R
@@ -5,8 +5,6 @@ knitr::opts_chunk$set(
 )
 
 
-options(rmarkdown.html_vignette.check_title = FALSE)
-
 ## ----setup--------------------------------------------------------------------
 library(boots.lmer)
 
@@ -26,11 +24,3 @@ output<-boots.samples(dat = example.dat, sub.id = "subjects",B=4)
 output
 # knitr::kable(output,caption = "Table 2. Four boostrap samples of example.dat data sets. The sampling unit is subject, hence the number of rows of each data set is expected to be different from one another.",row.names = FALSE)
 
-## -----------------------------------------------------------------------------
-
-output[[1]]
-# knitr::kable(output[[1]],caption = "Table 2.A. The first bootstrap sample",row.names = FALSE)
-
-## -----------------------------------------------------------------------------
-# example.dat[output[[1]]$index,]
-
diff --git a/boots.lmer.Rcheck/00_pkg_src/boots.lmer/inst/doc/create_bootstrap_samples.Rmd b/boots.lmer.Rcheck/00_pkg_src/boots.lmer/inst/doc/create_bootstrap_samples.Rmd
@@ -3,8 +3,10 @@ title: "create_bootstrap_samples"
 output: rmarkdown::html_vignette
 vignette: >
   %\VignetteIndexEntry{create_bootstrap_samples}
-  %\VignetteEngine{knitr::rmarkdown}
   %\VignetteEncoding{UTF-8}
+  %\VignetteEngine{knitr::rmarkdown}
+editor_options: 
+  chunk_output_type: console
 ---
 
 ```{r, include = FALSE}
@@ -13,8 +15,6 @@ knitr::opts_chunk$set(
   comment = "#>"
 )
 
-
-options(rmarkdown.html_vignette.check_title = FALSE)
 ```
 
 
@@ -67,20 +67,5 @@ output
 # knitr::kable(output,caption = "Table 2. Four boostrap samples of example.dat data sets. The sampling unit is subject, hence the number of rows of each data set is expected to be different from one another.",row.names = FALSE)
 ```
 
-The output is a list of 4, each list being a data frame. Each data frame contains two columns: **index** and **no.repeat.id.** We will explain each column using the first bootstrap sample:
-
-```{r}
-
-output[[1]]
-# knitr::kable(output[[1]],caption = "Table 2.A. The first bootstrap sample",row.names = FALSE)
-```
-
-In the first sample, total 6 subjects are chosen, but the number of rows is different from the original data set. The original data set has 10 rows, but the first bootstrap sample has 11 rows. This is because again, our sample of unit is subject, where there are different number of observations for each subject.   
-
-The **index** column shows the row number of the original data set for each row of the bootstrap sample. For example, the first two rows belong to John, with index 2 and 9. We can get the bootstrap data using these index. 
-
-```{r}
-# example.dat[output[[1]]$index,]
-```
-
-The **index** can be used to select the right observation from the original data set. However, if we are to use the subjects as a random effect in our linear mixed effects model, the subjects has to be distinguished each time it is chosen. For example, in our first bootstrap sample, John is chosen 3 times. If we use `John` as is in the random effects, R will not be able to distinguish these 3 different `John`s. Therefore, we need to name the 3 `John`s differently. If we look at **no.repeat.id** column in Table 2.A, the different names are assigned with `__#` format, where `#` is the number of times the subject was sampled. Hence, we see that we have `John__1`, `John__2`, and `John__3` to take into account `John` as 3 different subjects.
+The output is a list of 4, each list being a bootstrap sample data frame. Each data frame also contains a new column `no.repeat.sub.id`. This column is the same as `subject` column, except it has trailing `__#` where `#` shows the number of times the subject was included in the bootstrap sampling. 
+For example, in the first bootstrap sampling, `John__1`, `John__2`, and `John__3` appear. Therefore, John was collected 3 times, and we can use `no.repeat.sub.id` for random effects to correctly take into account that there are three different Johns.