Skip to content

Commit

Permalink
finish updates of grch38 and grch37 data frams
Browse files Browse the repository at this point in the history
  • Loading branch information
timothy-barry committed Apr 30, 2024
1 parent 4320b34 commit 84142ba
Show file tree
Hide file tree
Showing 10 changed files with 31 additions and 20 deletions.
1 change: 0 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ Imports:
cowplot,
crayon,
data.table,
R.utils,
dplyr,
ggplot2,
Matrix,
Expand Down
9 changes: 7 additions & 2 deletions R/data.R
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
#' Gene position data frame
#' Gene position data frames
#'
#' `gene_position_data_frame_grch38` maps each gene to the chromosome on which it is located and the position of its transcription start site on that chromosome. The data frame was constructed from the GRCh38 reference genome that has shipped with CellRanger since 2020.
#' `gene_position_data_frame_grch38` and `gene_position_data_frame_grch37` contain the coordinate and transcription start site position of each gene relative to reference genome GRCh38 and GRCh37, respectively. Both `gene_position_data_frame_grch38` and `gene_position_data_frame_grch37` were constructed from reference genomes available on the 10x Genomics website. The GRCh38 reference genome has been used by 10x Cell Ranger since 2020.
#'
#' @usage data(gene_position_data_frame_grch38)
#' @examples
#' head(gene_position_data_frame_grch38)
#' head(gene_position_data_frame_grch37)
"gene_position_data_frame_grch38"

#' @rdname gene_position_data_frame_grch38
#' @usage data(gene_position_data_frame_grch37)
"gene_position_data_frame_grch37"
2 changes: 1 addition & 1 deletion R/pair_constructor_functs.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Construct *cis* pairs
#'
#' `construct_cis_pairs()` is a helper function to facilitate construction the *cis* pairs. `construct_cis_pairs()` returns the set of target-response pairs for which the target and response are located on the same chromosome and in close physical proximity to one another. `construct_cis_pairs()` is a useful pair constructor function for screens that aim to map noncoding regulatory elements (e.g., enhancers or noncoding GWAS variants) to target genes in *cis*. `construct_cis_pairs()` assumes that the columns `chr`, `start`, and `stop` are present in the `grna_target_data_frame`, giving the chromosome, start position, and end position, respectively, of the region that each gRNA targets. `construct_cis_pairs()` takes several arguments: `sceptre_object` (required) `distance_threshold` (optional), `positive_control_pairs` (optional), and `response_position_data_frame` (optional). By default, `construct_cis_pairs()` pairs each gRNA target to the set of responses on the same chromosome as that target and within `distance_threshold` bases of that target. (The default value of `distance_threshold` is 500,000 bases, or half a megabase.) The `positive_control_pairs` data frame optionally can be passed to `construct_cis_pairs()`, in which case the positive control targets (i.e., the entries within the `grna_target` column of `positive_control_pairs`) are excluded from the *cis* pairs. One may want to exclude these from the discovery analysis if these targets are intended for positive control purposes only. See \href{https://timothy-barry.github.io/sceptre-book/set-analysis-parameters.html#sec-set-analysis-parameters_construct_cis_pairs}{Section 2.2.2 of the manual} for more detailed information about this function.
#' `construct_cis_pairs()` is a helper function to facilitate construction the *cis* pairs. `construct_cis_pairs()` returns the set of target-response pairs for which the target and response are located on the same chromosome and in close physical proximity to one another. `construct_cis_pairs()` is a useful pair constructor function for screens that aim to map noncoding regulatory elements (e.g., enhancers or noncoding GWAS variants) to target genes in *cis*. `construct_cis_pairs()` assumes that the columns `chr`, `start`, and `stop` are present in the `grna_target_data_frame`, giving the chromosome, start position, and end position, respectively, of the region that each gRNA targets. `construct_cis_pairs()` takes several arguments: `sceptre_object` (required), `distance_threshold` (optional), `positive_control_pairs` (optional), and `response_position_data_frame` (optional). By default, `construct_cis_pairs()` pairs each gRNA target to the set of responses on the same chromosome as that target and within `distance_threshold` bases of that target. (The default value of `distance_threshold` is 500,000 bases, or half a megabase.) The `positive_control_pairs` data frame optionally can be passed to `construct_cis_pairs()`, in which case the positive control targets (i.e., the entries within the `grna_target` column of `positive_control_pairs`) are excluded from the *cis* pairs. One may want to exclude these from the discovery analysis if these targets are intended for positive control purposes only. See \href{https://timothy-barry.github.io/sceptre-book/set-analysis-parameters.html#sec-set-analysis-parameters_construct_cis_pairs}{Section 2.2.2 of the manual} for more detailed information about this function.
#'
#' @param sceptre_object a `sceptre_object`
#' @param distance_threshold (optional) target-response pairs located within `distance_threshold` bases of one another and on the same chromosome are included in the *cis* discovery set.
Expand Down
23 changes: 12 additions & 11 deletions data-raw/DATASET_gene_table.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ library(data.table)
conflicts_prefer(dplyr::rename)
conflicts_prefer(dplyr::filter)

#############
# hg 38 table
#############
###############
# grch 38 table
###############
# CellRanger provides a human reference genome, which can be downloaded via the following command:
# curl -O https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-GRCh38-2020-A.tar.gz
# The version of the reference is GRCh38. This script extracts the start position, end position,
Expand All @@ -31,17 +31,18 @@ gene_table <- cbind(dt_gene_chr[,c("chr", "start", "end", "strand")], gene_ids_a
dplyr::mutate(chr = factor(chr)) |> dplyr::mutate(position = ifelse(strand == "+", start, end)) |>
dplyr::select(-start, -end, -strand)
data.table::setorderv(gene_table, c("chr", "position"))
gene_table <- gene_table |> dplyr::select(response_id, response_name, chr, position)
gene_position_data_frame_grch38 <- gene_table
usethis::use_data(gene_position_data_frame_grch38, internal = FALSE, overwrite = TRUE)

#############
# hg 19 table
#############
###############
# grch 37 table
###############
rm(list = ls())
# We obtained the hg37 reference genome from cellranger
# We obtained the grch 37 reference genome from cellranger
# wget ftp://ftp.ensembl.org/pub/grch37/release-84/gtf/homo_sapiens/Homo_sapiens.GRCh37.82.gtf.gz
library(rtracklayer)
dt <- readGFF("~/research_offsite/external/ref/Homo_sapiens.GRCh37.82.gtf.gz")
dt <- readGFF("~/research_offsite/external/ref/Homo_sapiens.GRCh37.82.gtf.gz") |> as.data.table()
# retain only genes
dt <- dt |> dplyr::filter(type == "gene")
# keep only those genes on a chromosome
Expand All @@ -52,6 +53,6 @@ dt <- dt |>
dplyr::select(response_id = gene_id,
response_name = gene_name,
chr, position)
gene_position_data_frame_grch19 <- dt
gene_position_data_frame_grch19$chr <- factor(gene_position_data_frame_grch19$chr)
usethis::use_data(gene_position_data_frame_grch19, internal = FALSE, overwrite = TRUE)
gene_position_data_frame_grch37 <- dt
gene_position_data_frame_grch37$chr <- factor(gene_position_data_frame_grch37$chr)
usethis::use_data(gene_position_data_frame_grch37, internal = FALSE, overwrite = TRUE)
Binary file removed data/gene_position_data_frame_grch19.rda
Binary file not shown.
Binary file added data/gene_position_data_frame_grch37.rda
Binary file not shown.
Binary file modified data/gene_position_data_frame_grch38.rda
Binary file not shown.
2 changes: 1 addition & 1 deletion man/construct_cis_pairs.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 9 additions & 3 deletions man/gene_position_data_frame_grch38.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion vignettes/sceptre.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ We describe each step of the pipeline in greater detail below.

## 1. Import data

The first step is to import the data. **Data can be imported into `sceptre` from 10X Cell Ranger or Parse outputs, as well as from R matrices.** The simplest way to import the data is to read the output of one or more calls to `cellranger_count` into `sceptre` via the function `import_data_from_cellranger()`. `import_data_from_cellranger()` requires three arguments: `directories`, `grna_target_data_frame`, and `moi`.
The first step is to import the data. **Data can be imported into `sceptre` from 10x Cell Ranger or Parse outputs, as well as from R matrices.** The simplest way to import the data is to read the output of one or more calls to `cellranger_count` into `sceptre` via the function `import_data_from_cellranger()`. `import_data_from_cellranger()` requires three arguments: `directories`, `grna_target_data_frame`, and `moi`.

1. `directories` is a character vector specifying the locations of the directories outputted by one or more calls to `cellranger_count`. Below, we set the variable `directories` to the (machine-dependent) location of the example CRISPRi data on disk.

Expand Down

0 comments on commit 84142ba

Please sign in to comment.