You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I am attempting to integrate a dataset comprised of 22 samples (~1M cells total) and I specifically would like to regress out two covariates (GCResponse and Subject). I have attempted this two ways with separate issues arising from each. I followed the standard Seurat pipeline for QC on the merged object and found that the two covariates in question showed signficant batch effects:
`
Run QC on the Merged Seurat
Combined[["percent.mt"]] <- PercentageFeatureSet(Combined, pattern = "^MT-")
VlnPlot(Combined, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3, pt.size = 0)
Combined <- subset(Combined, subset = nFeature_RNA >= 250 & nCount_RNA >= 500 & percent.mt < 15) #Need to adjust these parameters based on violin plot
Combined <- NormalizeData(Combined)
Identify the most variable genes in the merged object
This somewhat mitigated the effects of the 'Subject' covariate, but still resulted in fairly substantial 'GCResponse' effects (especially between the Healthy and High Responder groups, which is unexpected biologically):
I then attempted to instead run harmony integration directly via RunHarmony:
After some tuning of the parameters, I achieved a more uniform UMAP with minimal effects from Subject or GCResponse. My issue here is that the clustering now seems to be heavily influenced by GCResponse instead of the UMAP space.
I am at a loss for how to correct for this, as adjusting the dims and resolution parameters in FindNeighbords and FindClusters, respectively, have not helped. Any advice would be highly appreciated, thank you!
The text was updated successfully, but these errors were encountered:
Hello, I am attempting to integrate a dataset comprised of 22 samples (~1M cells total) and I specifically would like to regress out two covariates (GCResponse and Subject). I have attempted this two ways with separate issues arising from each. I followed the standard Seurat pipeline for QC on the merged object and found that the two covariates in question showed signficant batch effects:
`
Run QC on the Merged Seurat
Combined[["percent.mt"]] <- PercentageFeatureSet(Combined, pattern = "^MT-")
VlnPlot(Combined, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3, pt.size = 0)
Combined <- subset(Combined, subset = nFeature_RNA >= 250 & nCount_RNA >= 500 & percent.mt < 15) #Need to adjust these parameters based on violin plot
Combined <- NormalizeData(Combined)
Identify the most variable genes in the merged object
Combined <- FindVariableFeatures(Combined,
selection.method = "vst",
nfeatures = 2000,
verbose = FALSE)
Scale the counts
Combined <- ScaleData(Combined)
Perform PCA
Combined <- RunPCA(Combined, npcs = 50)
Visualize the unintegrated dataset
Combined <- FindNeighbors(Combined, dims = 1:20, reduction = "pca")
Combined <- FindClusters(Combined, resolution = 0.3, cluster.name = "unintegrated_clusters")
Combined <- RunUMAP(Combined, dims = 1:20, reduction = "pca", reduction.name = "umap.unintegrated")
DimPlot(Combined, reduction = "umap.unintegrated", group.by = 'Subject')`
The unintegrated UMAP shows signficant batch effects arising from the 'Subject' covariate (and in turn 'GCResponse'):
I then attempted to use harmony integration via IntegrateLayers:
Integrate the dataset
`library(parallel)
detectCores() #24
num_cores = 30
combined.hy <- IntegrateLayers(
object = Combined, method = HarmonyIntegration,
orig.reduction = "pca", new.reduction = "harmony",
theta = 4, lambda = 0.5,
max.iter.harmony = 20L, max.iter.cluster = 50L,
verbose = TRUE, num.threads = num_cores)
combined.hy <- FindNeighbors(combined.hy, dims = 1:30)
combined.hy <- FindClusters(combined.hy, resolution = 0.3)
combined.hy <- RunUMAP(combined.hy, reduction = "harmony", dims = 1:30, reduction.name = "umap.harmony")
DimPlot(combined.hy, reduction = "umap.harmony", split.by = 'GCResponse', label.size = 2)`
This somewhat mitigated the effects of the 'Subject' covariate, but still resulted in fairly substantial 'GCResponse' effects (especially between the Healthy and High Responder groups, which is unexpected biologically):
I then attempted to instead run harmony integration directly via RunHarmony:
`
Integrate the dataset
combined.Hy <- RunHarmony(Combined,
group.by.vars = c("Subject", "GCResponse"),
reduction = "pca", assay.use = "SCT", reduction.save = "harmony",
dims = 1:30, theta = c(4,4), lambda = 0.5, max_iter = 50)
combined.Hy <- FindNeighbors(combined.Hy, dims = 1:30)
combined.Hy <- FindClusters(combined.Hy, resolution = 0.3)
combined.Hy <- RunUMAP(combined.Hy, reduction = "harmony", dims = 1:30, reduction.name = "umap.harmony")
DimPlot(combined.Hy, reduction = "umap.harmony", split.by = 'GCResponse', label.size = 2) `
After some tuning of the parameters, I achieved a more uniform UMAP with minimal effects from Subject or GCResponse. My issue here is that the clustering now seems to be heavily influenced by GCResponse instead of the UMAP space.
I am at a loss for how to correct for this, as adjusting the dims and resolution parameters in FindNeighbords and FindClusters, respectively, have not helped. Any advice would be highly appreciated, thank you!
The text was updated successfully, but these errors were encountered: