sincei.multimodalClustering module#
- sincei.multimodalClustering.multiModal_clustering(mdata, modalities=None, method='PCA', modal_weights=None, column_key=None, nK=30, nPrinComps=20, clusterResolution=1.0, binarize=False, glmPCAfamily='poisson')[source]#
Performs multi-graph clustering on matched keys (barcodes) of a mudata object and stores the clustering results in mdata.obs["cluster_multi"]. It also stores the UMAP coordinates for each of the specified modalities in mdata[mod].obsm["X_umap"], where mod is the modality.
Note: If method is "PCA" or "logPCA", the data matrix of the modality will be normalized, and log1p-transformed in the case of logPCA.
Parameters#
- mudataMuData
MuData object containing several data modalities
- modalitieslist[str]
List of modalities to use for clustering, e.g. ["RNA", "ATAC", "ChIC"]
- methodlist[str]
What processing method to use for each modality. Choose between "PCA", "logPCA", "glmPCA", "LSA" or "LDA". Default is "PCA" for all modalities.
- modal_weightslist[float]
Weights for each modality in the clustering process. Default is equal weighting. E.g. for RNA and ChIC, use [2, 1].
- column_keystr, optional
Column name for the barcode. If None, the index of .obs for each modality is used.
- nKint
Number of nearest neighbours to consider for clustering and UMAP. This number should be chosen considering the total number of cells and expected number of clusters. Smaller number will lead to more fragmented clusters.
- nPrinCompsint or list[int]
Number of principal components (for logPCA or glmPCA) or number of topics (for LSA and LDA) to use for model. Use higher number for samples with more expected heterogenity. If list is provided, it must contain a value for each modality. Default is 20.
- clusteResolutionfloat
Resolution parameter for clustering. Values lower than 1.0 result in less clusters, while higher values lead to splitting of clusters. In most cases, the optimum value would be between 0.8 and 1.2. Default is 1.0 .
- binarizebool
Whether to binarize the counts per region before dimensionality reduction (only for LSA/LDA).
- glmPCAfamilystr
The choice of exponential family distribution to use for glmPCA method. Default is "poisson".
- sincei.multimodalClustering.umap_aligned(mdata, modalities=None, column_key=None, nK=30, distance_metric='euclidean')[source]#
Aligns the UMAP embeddings of the selected modalities in a mudata object using the UMAP AlignedUMAP class and stores them in mdata[mod].obsm["X_umap_aligned"], where mod is the modality. This produces an aligned UMAP for each modality, since the alignment for each may be slightly different.
Parameters#
- mudataMuData
MuData object containing several data modalities
- modalitieslist[str]
List of modalities to use for clustering, e.g. ["RNA", "ATAC", "ChIC"]
- column_keystr, optional
Column name for the barcode. If None, the index of .obs for each modality is used.
- nKint
Number of nearest neighbors to use for UMAP
- distance_metricstr
Distance metric to use for UMAP, e.g. "euclidean", "cosine", etc.