scFindVCRs#

scFindVCRs calls variable chromatin regions (VCRs) from binned chromatin data. It takes a .h5ad file containing single-cell genomic signal in bins, and outputs BED files with genome segmentations for different sensitivities.

First, a bin-to-bin correlation matrix is computed for each chromosome.

Then, the correlation matrix is turned into a score map by convolving a number of square Gaussian kernels along its main diagonal. Each kernel has a sigma calculated using a maximum region size to consider. Each kernel produces a 1-D score for each bin, which are stacked into a matrix where each row corresponds to a kernel scale and each column to a bin.

Finally, the PELT change-point detection algorithm is applied to the score map to identify regions with distinct correlation patterns. This step depends on a penalty parameter that controls the number of detected regions.

usage: scFindVCRs -i binned_signal.h5ad -bs 2000 -mr 100000 -nk 20 -pen 0.05 0.1 0.5 -o detected_VCRs.bed

Input/Output options#

--input, -i

sincei-generated input file in .h5ad format.

VCR detection options#

--binSize, -bs

The size of the bins in the input Anndata object.

--maxRegionSize, -mr

The maximum region size to be considered, in base pairs. Larger regions may increase compute time. Defaults to 100 times the bin size.

--nKernels, -nk

The number of kernels to use for the score map. More kernels generally lead to a better segmentation, but increase the computational cost.

--penalties, -pen

Penalty value for change-point detection. Higher values result in fewer segments. Multiple values can be provided (separated by space). Each penalty value will produce a separate set of regions within which can be seperated from the output BED file by filtering on the "score" column.

--outFile, -o

Name of the output file (BED format) with genome segmentation result. The penalty threshold that defines each segment is saved in the "score" column of the BED file, and the BED file can be filtered based on this column to obtain non-overlapping segments.

--region, -r

Region of the genome to limit the operation to - this is useful when testing parameters to reduce the computing time. The format is chr:start:end, for example --region chr10 or --region chr10:456700:891000.

--numberOfProcessors, -p

Number of processors to use. Type "max/2" to use half the maximum number of processors or "max" to use all available processors. (Default: "max")

Other options#

--verbose, -v

Set to see processing messages.

-V, --version

Print the program version and exit.