Skip to content

Commit

Permalink
R CMD check now works. Still get 3 warnings. Got rid of data.table. U…
Browse files Browse the repository at this point in the history
…se just data.frame. updated a whole bunch of stuff.
  • Loading branch information
hyejung0 committed Mar 16, 2023
1 parent 982765e commit aa336f5
Show file tree
Hide file tree
Showing 93 changed files with 3,809 additions and 349 deletions.
4 changes: 2 additions & 2 deletions boots.lmer.Rcheck/00_pkg_src/boots.lmer/DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
Suggests: knitr, rmarkdown
VignetteBuilder: knitr
Depends: data.table, lme4, rlang, parallel, stringr
Depends: R (>= 3.5.0), data.table
NeedsCompilation: no
Packaged: 2023-03-14 22:14:56 UTC; u1317537
Packaged: 2023-03-16 01:59:32 UTC; u1317537
Author: Hyejung Lee [aut, cre] (<https://orcid.org/0000-0002-5293-4134>)
Maintainer: Hyejung Lee <[email protected]>
50 changes: 31 additions & 19 deletions boots.lmer.Rcheck/00_pkg_src/boots.lmer/R/boots.lmer.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,50 +6,62 @@
#' @param y character string. Name of column in each data.table of boots.samples.list, to be used as the outcome in LMER.
#' @param X a vector of character string. Names of the of covariates to fit in LMER, as they appear on each dataset of boots.samples.list.
#' @param use.formula a vector of class formula. This is to be used in lmer fit. If not provided, then a basic additive fixed effects model with random effect of (1|no.repeat.sub.id) will be fit.
#' @param num_workers integer. A number of cores to use for parallel computing.
#' @examples
#' example.subject<-c("Sarah","John","Beth","Anna","Sarah","Sarah","Chris","Blake","John","Anna")
#' example.dat<-data.frame("Y"=rnorm(n=length(example.subject)),
#' "X"=rpois(n=length(example.subject), lambda = 3),
#' "subjects"=example.subject)
#' "X1"=rpois(n=length(example.subject), lambda = 3),
#' "X2"=rnorm(n=length(example.subject)),
#' "X3"=rbeta(n=length(example.subject), shape1 = 3, shape2 = 0.5),
#' "subjects"=example.subject)
#' output<-boots.samples(dat=example.dat,sub.id = "subjects",B=4) #create 4 bootstrap samples
#' lmer.out<-boots.lmer(y="Y", X=c("X1","X2","X3"), boots.samples.list = output)
#' @export
boots.lmer<-function(y,X,boots.samples.list,use.formula=NULL){
library(data.table)
boots.lmer<-function(y,X,boots.samples.list,use.formula=NULL, num_workers=NULL){



lmer.fit<-function(y,X,dat,use.formula=NULL){#a function that fits one LMER


if(!is.null(use.formula)){#If we have a specific formula we want to use for lmer, then we use it.
dat[,lme4::lmer(use.formula,data=dat, REML=FALSE)]
}else{ #if not, then we will fit a basic function with random effect with common intercept, etc.
#Make sure we use backtick quotes in case the names have spaces and special characters
if(is.null(use.formula)){#if we are not provided with specific formula, then create our own
X.names<-lapply(X, as.name)
#If we have more than one covariate, concatenate them with +
if(length(X)>1){
X.names<-paste(X.names, collapse=" + ")
X.formula<-as.formula(paste0("~ ", paste(X.names, collapse=" + ")))
}
model_formula1<-formula(paste0("`",rlang::quo_name(y),"`", "~", rlang::quo_name(X.names),"+(1|no.repeat.sub.id)"))
dat[,lme4::lmer(model_formula1,data=dat, REML=FALSE)]

if(length(unique(dat$no.repeat.sub.id))==nrow(dat)){ #If no subjects are repeating, no need to fit LMER, use LM.
use.formula<-formula(paste0("`",rlang::quo_name(y),"`",rlang::quo_name(X.formula)))
lm(use.formula,data=dat)
}else{
use.formula<-formula(paste0("`",rlang::quo_name(y),"`",rlang::quo_name(X.formula),"+(1|no.repeat.sub.id)"))
lme4::lmer(use.formula,data=dat, REML=FALSE)
}
}else{ #If we are provided with formula....
lme4::lmer(use.formula,data=dat, REML=FALSE)
}

}

if(is.null(num_workers)){
#CRAN limits the number of cores available to packages to 2, for performance reasons. There was a thread in the mailing list, I believe, but I can't find it right now.
chk <- Sys.getenv("_R_CHECK_LIMIT_CORES_", "")

#CRAN limits the number of cores available to packages to 2, for performance reasons. There was a thread in the mailing list, I believe, but I can't find it right now.
chk <- Sys.getenv("_R_CHECK_LIMIT_CORES_", "")
if (nzchar(chk) && chk == "TRUE") {
# use 2 cores in CRAN/Travis/AppVeyor
num_workers <- 2L
} else {
# use all cores in devtools::test()
num_workers <- parallel::detectCores()
}

if (nzchar(chk) && chk == "TRUE") {
# use 2 cores in CRAN/Travis/AppVeyor
num_workers <- 2L
} else {
# use all cores in devtools::test()
num_workers <- parallel::detectCores()
}



parallel::mclapply(boots.samples.list,function(boots.dat){
#Convert to data table to save time
lmer.fit(y=y,X=X,dat=boots.dat)}
,mc.cores = num_workers)

Expand Down
22 changes: 15 additions & 7 deletions boots.lmer.Rcheck/00_pkg_src/boots.lmer/R/boots.samples.R
Original file line number Diff line number Diff line change
Expand Up @@ -41,18 +41,26 @@ boots.samples<-function(dat, sub.id,B){
output<-
apply(no.repeat.id,2,function(x){
temp<-
sapply(x,function(x_i){ #for each subject
lapply(x,function(x_i){ #for each subject
index<-which(sub.id==stringr::word(x_i,1,sep = "\\__")) #index of original data where the subject id appears
cbind(index, "no.repeat.id"=rep(x_i,length(index)))}) #get all observation for this particular subject.

#bind the index and no.repeat.id. as rows
temp.dat<-do.call(rbind, lapply(temp, data.table::data.table))
complete.dat<-data.table::as.data.table(dat[temp.dat$index,])
# complete.dat[,no.repeat.sub.id:=(temp.dat$no.repeat.id)] # add the new subject ID column, with no repetition
complete.dat[,"no.repeat.sub.id"]<-temp.dat$no.repeat.id

temp.dat<-do.call(rbind, lapply(temp, data.frame))
complete.dat<-dat[temp.dat$index,]
complete.dat[,"no.repeat.sub.id"]<-temp.dat[,"no.repeat.id"]
return(complete.dat)
}

# #bind the index and no.repeat.id. as rows
# temp.dat<-do.call(rbind, lapply(temp, data.table::data.table))
# complete.dat<-data.table::as.data.table(dat[temp.dat$index,])
# # complete.dat[,no.repeat.sub.id:=(temp.dat$no.repeat.id)] # add the new subject ID column, with no repetition
# # complete.dat$no.repeat.sub.id<-temp.dat$no.repeat.id
# complete.dat[,"no.repeat.sub.id"]<-temp.dat$no.repeat.id
# return(complete.dat)
}
)

output
}

Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ knitr::opts_chunk$set(
)


options(rmarkdown.html_vignette.check_title = FALSE)

## ----setup--------------------------------------------------------------------
library(boots.lmer)

Expand All @@ -26,11 +24,3 @@ output<-boots.samples(dat = example.dat, sub.id = "subjects",B=4)
output
# knitr::kable(output,caption = "Table 2. Four boostrap samples of example.dat data sets. The sampling unit is subject, hence the number of rows of each data set is expected to be different from one another.",row.names = FALSE)

## -----------------------------------------------------------------------------

output[[1]]
# knitr::kable(output[[1]],caption = "Table 2.A. The first bootstrap sample",row.names = FALSE)

## -----------------------------------------------------------------------------
# example.dat[output[[1]]$index,]

Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,10 @@ title: "create_bootstrap_samples"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{create_bootstrap_samples}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
---

```{r, include = FALSE}
Expand All @@ -13,8 +15,6 @@ knitr::opts_chunk$set(
comment = "#>"
)
options(rmarkdown.html_vignette.check_title = FALSE)
```


Expand Down Expand Up @@ -67,20 +67,5 @@ output
# knitr::kable(output,caption = "Table 2. Four boostrap samples of example.dat data sets. The sampling unit is subject, hence the number of rows of each data set is expected to be different from one another.",row.names = FALSE)
```

The output is a list of 4, each list being a data frame. Each data frame contains two columns: **index** and **no.repeat.id.** We will explain each column using the first bootstrap sample:

```{r}
output[[1]]
# knitr::kable(output[[1]],caption = "Table 2.A. The first bootstrap sample",row.names = FALSE)
```

In the first sample, total 6 subjects are chosen, but the number of rows is different from the original data set. The original data set has 10 rows, but the first bootstrap sample has 11 rows. This is because again, our sample of unit is subject, where there are different number of observations for each subject.

The **index** column shows the row number of the original data set for each row of the bootstrap sample. For example, the first two rows belong to John, with index 2 and 9. We can get the bootstrap data using these index.

```{r}
# example.dat[output[[1]]$index,]
```

The **index** can be used to select the right observation from the original data set. However, if we are to use the subjects as a random effect in our linear mixed effects model, the subjects has to be distinguished each time it is chosen. For example, in our first bootstrap sample, John is chosen 3 times. If we use `John` as is in the random effects, R will not be able to distinguish these 3 different `John`s. Therefore, we need to name the 3 `John`s differently. If we look at **no.repeat.id** column in Table 2.A, the different names are assigned with `__#` format, where `#` is the number of times the subject was sampled. Hence, we see that we have `John__1`, `John__2`, and `John__3` to take into account `John` as 3 different subjects.
The output is a list of 4, each list being a bootstrap sample data frame. Each data frame also contains a new column `no.repeat.sub.id`. This column is the same as `subject` column, except it has trailing `__#` where `#` shows the number of times the subject was included in the bootstrap sampling.
For example, in the first bootstrap sampling, `John__1`, `John__2`, and `John__3` appear. Therefore, John was collected 3 times, and we can use `no.repeat.sub.id` for random effects to correctly take into account that there are three different Johns.
Loading

0 comments on commit aa336f5

Please sign in to comment.