sincei.VCRfinder module#
- sincei.VCRfinder.sparse_band_corr(X, k, chrom=None, verbose=True)[source]#
Compute only the first k diagonals of the correlation matrix of X, stored in banded format. Works directly on sparse matrices.
Parameters#
- Xscipy.sparse matrix or np.ndarray
Input data matrix of shape (n_samples, n_features).
- kint
Number of diagonals to compute (bandwidth).
- chromstr or None, optional
Chromosome name for progress bar description.
- verbosebool, optional
Whether to display progress bars.
Returns#
- band_corrnp.ndarray, shape (2*k+1, n_features)
Banded correlation matrix.
- sincei.VCRfinder.distance_kernel(sigma, truncate=4.0, radius=None)[source]#
Create a square Gaussian distance kernel.
Parameters#
- sigmafloat
Standard deviation of the Gaussian.
- truncatefloat, optional
Truncate the kernel at this many standard deviations. Default is 4.0.
- radiusint, optional
Radius of the kernel. If None, it is set to int(truncate * sigma).
Returns#
- kernel2D numpy array
The Gaussian distance kernel.
- sincei.VCRfinder.VCRfinder(adata, binsize, max_region, n_kernels=20, penalties=[1], region=None, verbose=False, n_threads=1)[source]#
Detects variable chromatin regions (VCRs) from a anndata object containing genomic signal data in equally sized bins (see scCountReads) .
First, a bin-to-bin correlation matrix is computed for each chromosome.
Then, the correlation matrix is turned into a score map by convolving a number of square Gaussian kernels along its main diagonal. Each kernel has a sigma calculated using. Each kernel produces a 1-D score for each bin, which are stacked into a matrix where each row corresponds to a kernel scale and each column to a bin.
Finally, the PELT change-point detection algorithm is applied to the score map to identify regions with distinct correlation patterns. This step depends on a penalty parameter that controls the number of detected regions.
The function returns a pandas DataFrame containing the detected variable chromatin regions at each penalty. The DataFrame has columns: 'penalty', 'chrom', 'start', 'end'.
Parameters#
- adataanndata.AnnData
Input anndata object with binned chromatin data. adata.var must contain 'chrom', 'start', and 'end' columns.
- binsizeint
Size of the bins in base pairs.
- max_regionint
Size of the largest kernel in base pairs.
- n_kernelsint, optional
Number of Gaussian kernels to use for convolution. Default is 20.
- penaltieslist of float, optional
List of penalty values for the change-point detection algorithm. Default is [1].
- regionstr, optional
Genomic region to limit the analysis to (e.g., 'chr1:100000:200000'). Default is None.
- verbosebool, optional
Print progress messages and warnings. Default is False.
- n_threadsint, optional
Number of threads to use for parallel processing, by default 1.
Returns#
- outputpd.DataFrame
Output DataFrame with detected variable chromatin regions at each penalty.