Skip to content

elisawarner/BIOINF590_Code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

bfc803e · Jan 3, 2020

History

24 Commits
Nov 26, 2019
Nov 26, 2019
Dec 13, 2019
Jan 3, 2020
Nov 9, 2019
Dec 12, 2019
Dec 12, 2019
Dec 15, 2019
Dec 13, 2019

Repository files navigation

BIOINF590_Code

Authors:

  • Stephanie The University of Michigan
  • Elisa Warner University of Michigan

Final Project for BIOINF 590

Goal is creation of a tool-kit for analyzing sparse single-cell data. The project is designed for people with single-cell data of several patients with many cells for a given cell type. The project gives various Euclidean and non-Euclidean distance matrices that can be used to compare patients based on their single-cell data. For best results, reduce the number of comparative genes so that n > p (n number of cells is greater than p number of gene expression values).

  • Stephanie_DE_correlation_gene_selection.R : This R file uses Seurat DE to narrow down a set of genes to those with only 0.9 correlation. Results in C number of datasets for C different cell types. Each dataset contains many patients with many cells. One row of the dataset is characterized by one patient cell and its gene expression values over k number of genes.
  • jblogdet.ipynb : This function calculates the Jensen-Bregman Log Determinant Divergence for sparse matrices. Additional regularization of your sparse matrix may be necessary
  • Elisa_Similarity_Score.ipynb : Analysis of the matrices (cells v gene expression). First, we conduct a 2-D PCA, then a 3-D PCA. Finally, we calculate distance metrics of patients against each other. The first is characterized by an 2 -norm, where we average the column expresson values of the gene expressions for each patient to get one mean expression vector for each patient. Then we calculate the 2 distance between each patient vector to see how distanced they are from each other. Next, we try a covariance method, where we create a k × k matrix for each patient of each cell type. Then patients within a cell type are compared by the distances/similarities between their covariance matrices. We try three divergence measures: 1) Jensen-Bregman Log Determinant Divergence, 2) Affine-Invariant Riemannian Metric, 3) Log-Euclidean Riemannian Metric.