sincei.VCRfinder module#

sincei.VCRfinder.sparse_band_corr(X, k, chrom=None, verbose=True)[source]#

Compute only the first k diagonals of the correlation matrix of X, stored in banded format. Works directly on sparse matrices.

Parameters#

Xscipy.sparse matrix or np.ndarray: Input data matrix of shape (n_samples, n_features).
kint: Number of diagonals to compute (bandwidth).
chromstr or None, optional: Chromosome name for progress bar description.
verbosebool, optional: Whether to display progress bars.

Returns#

band_corrnp.ndarray, shape (2*k+1, n_features): Banded correlation matrix.

sincei.VCRfinder.distance_kernel(sigma, truncate=4.0, radius=None)[source]#

Create a square Gaussian distance kernel.

Parameters#

sigmafloat: Standard deviation of the Gaussian.
truncatefloat, optional: Truncate the kernel at this many standard deviations. Default is 4.0.
radiusint, optional: Radius of the kernel. If None, it is set to int(truncate * sigma).

Returns#

kernel2D numpy array: The Gaussian distance kernel.

sincei.VCRfinder.VCRfinder(adata, binsize, max_region, n_kernels=20, penalties=[1], region=None, verbose=False, n_threads=1)[source]#

Detects variable chromatin regions (VCRs) from a anndata object containing genomic signal data in equally sized bins (see scCountReads) .

First, a bin-to-bin correlation matrix is computed for each chromosome.

Then, the correlation matrix is turned into a score map by convolving a number of square Gaussian kernels along its main diagonal. Each kernel has a sigma calculated using. Each kernel produces a 1-D score for each bin, which are stacked into a matrix where each row corresponds to a kernel scale and each column to a bin.

Finally, the PELT change-point detection algorithm is applied to the score map to identify regions with distinct correlation patterns. This step depends on a penalty parameter that controls the number of detected regions.

The function returns a pandas DataFrame containing the detected variable chromatin regions at each penalty. The DataFrame has columns: 'penalty', 'chrom', 'start', 'end'.

Parameters#

adataanndata.AnnData: Input anndata object with binned chromatin data. adata.var must contain 'chrom', 'start', and 'end' columns.
binsizeint: Size of the bins in base pairs.
max_regionint: Size of the largest kernel in base pairs.
n_kernelsint, optional: Number of Gaussian kernels to use for convolution. Default is 20.
penaltieslist of float, optional: List of penalty values for the change-point detection algorithm. Default is [1].
regionstr, optional: Genomic region to limit the analysis to (e.g., 'chr1:100000:200000'). Default is None.
verbosebool, optional: Print progress messages and warnings. Default is False.
n_threadsint, optional: Number of threads to use for parallel processing, by default 1.

Returns#

outputpd.DataFrame: Output DataFrame with detected variable chromatin regions at each penalty.

sincei.VCRfinder module

Contents

sincei.VCRfinder module#

Parameters#

Returns#

Parameters#

Returns#

Parameters#

Returns#