Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clustering by variable and not by UMAP after RunHarmony #266

Open
RRingquist opened this issue Jan 27, 2025 · 1 comment
Open

Clustering by variable and not by UMAP after RunHarmony #266

RRingquist opened this issue Jan 27, 2025 · 1 comment

Comments

@RRingquist
Copy link

RRingquist commented Jan 27, 2025

Hello, I am attempting to integrate a dataset comprised of 22 samples (~1M cells total) and I specifically would like to regress out two covariates (GCResponse and Subject). I have attempted this two ways with separate issues arising from each. I followed the standard Seurat pipeline for QC on the merged object and found that the two covariates in question showed signficant batch effects:

`

Run QC on the Merged Seurat

Combined[["percent.mt"]] <- PercentageFeatureSet(Combined, pattern = "^MT-")
VlnPlot(Combined, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3, pt.size = 0)
Combined <- subset(Combined, subset = nFeature_RNA >= 250 & nCount_RNA >= 500 & percent.mt < 15) #Need to adjust these parameters based on violin plot
Combined <- NormalizeData(Combined)

Identify the most variable genes in the merged object

Combined <- FindVariableFeatures(Combined,
selection.method = "vst",
nfeatures = 2000,
verbose = FALSE)

Scale the counts

Combined <- ScaleData(Combined)

Perform PCA

Combined <- RunPCA(Combined, npcs = 50)

Visualize the unintegrated dataset

Combined <- FindNeighbors(Combined, dims = 1:20, reduction = "pca")
Combined <- FindClusters(Combined, resolution = 0.3, cluster.name = "unintegrated_clusters")
Combined <- RunUMAP(Combined, dims = 1:20, reduction = "pca", reduction.name = "umap.unintegrated")
DimPlot(Combined, reduction = "umap.unintegrated", group.by = 'Subject')`

The unintegrated UMAP shows signficant batch effects arising from the 'Subject' covariate (and in turn 'GCResponse'):

Image

Image

I then attempted to use harmony integration via IntegrateLayers:

Integrate the dataset

`library(parallel)
detectCores() #24
num_cores = 30
combined.hy <- IntegrateLayers(
object = Combined, method = HarmonyIntegration,
orig.reduction = "pca", new.reduction = "harmony",
theta = 4, lambda = 0.5,
max.iter.harmony = 20L, max.iter.cluster = 50L,
verbose = TRUE, num.threads = num_cores)

combined.hy <- FindNeighbors(combined.hy, dims = 1:30)
combined.hy <- FindClusters(combined.hy, resolution = 0.3)
combined.hy <- RunUMAP(combined.hy, reduction = "harmony", dims = 1:30, reduction.name = "umap.harmony")
DimPlot(combined.hy, reduction = "umap.harmony", split.by = 'GCResponse', label.size = 2)`

This somewhat mitigated the effects of the 'Subject' covariate, but still resulted in fairly substantial 'GCResponse' effects (especially between the Healthy and High Responder groups, which is unexpected biologically):

Image

Image

Image

I then attempted to instead run harmony integration directly via RunHarmony:

`

Integrate the dataset

combined.Hy <- RunHarmony(Combined,
group.by.vars = c("Subject", "GCResponse"),
reduction = "pca", assay.use = "SCT", reduction.save = "harmony",
dims = 1:30, theta = c(4,4), lambda = 0.5, max_iter = 50)

combined.Hy <- FindNeighbors(combined.Hy, dims = 1:30)
combined.Hy <- FindClusters(combined.Hy, resolution = 0.3)
combined.Hy <- RunUMAP(combined.Hy, reduction = "harmony", dims = 1:30, reduction.name = "umap.harmony")
DimPlot(combined.Hy, reduction = "umap.harmony", split.by = 'GCResponse', label.size = 2) `

After some tuning of the parameters, I achieved a more uniform UMAP with minimal effects from Subject or GCResponse. My issue here is that the clustering now seems to be heavily influenced by GCResponse instead of the UMAP space.

Image

Image

I am at a loss for how to correct for this, as adjusting the dims and resolution parameters in FindNeighbords and FindClusters, respectively, have not helped. Any advice would be highly appreciated, thank you!

@pati-ni
Copy link
Collaborator

pati-ni commented Jan 27, 2025

Could you specify the reduction layer for the neighbors?

combined.Hy <- FindNeighbors(combined.Hy, dims = 1:30)
combined.Hy <- FindClusters(combined.Hy, resolution = 0.3)

Also integratelayers is a leaky abstraction that does not deal properly with several covariates. Use RunHarmony() instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants