scJSD#

scJSD samples regions in the genome from BAM files and compares the cumulative read coverages for each cell on those regions to a synthetic cell with poisson distributed reads using the Jensen-Shannon distance. Cells with high enrichment of signals show a higher JSD compared to cells whose signal is homogeneously distributed.

usage: plotFingerprint -b treatment.bam control.bam -plot fingerprint.png

Input/Output options#

--bamfiles, -b

List of indexed BAM files separated by spaces.

--barcodes, -bc

A single-column file containing the cell barcode whitelist, one barcode per line.

--outFile, -o

The file to write results to. For scFilterStats, scFilterBarcodes and scJSD, the output file is a .tsv file. For other tools, the output file is an updated .h5ad file with the result of the requested operation.

BAM processing options#

--cellTag, -ct

Name of the BAM tag from which to extract barcodes. (Default: 'BC')

--groupTag, -gt

In case of a groupped BAM file, such as the one containing Read Group (RG) or Sample (SM) tag,it is possible to process group the reads using the provided --groupTag argument. NOTE: In case of such input, please ensure that the --labels argument indicates the expected group labels contained in the BAM files. The --groupTag along with the --cellTag is then used to identify unique samples (cells) from the input.

--numberOfProcessors, -p

Number of processors to use. Type "max/2" to use half the maximum number of processors or "max" to use all available processors. (Default: "max")

--labels, -l

User defined labels instead of default labels from file names. Multiple labels have to be separated by a space, e.g. --labels sample1 sample2 sample3.

--smartLabels

Instead of manually specifying labels for the input BAM files, this causes sincei to use the file name after removing the path and extension.

--blacklist, --blackListFileName, -bl

A BED or GTF file containing regions that should be excluded from all analyses. Currently this works by rejecting genomic chunks that happen to overlap an entry. Consequently, for BAM files, if a read partially overlaps a blacklisted region or a fragment spans over it, then the read/fragment might still be considered. Please note that you should adjust the effective genome size, if relevant.

--chrToSkip

List of chromosome names to skip from the analysis. Regions on these chromosomes will be excluded. Useful for skipping mitochondrial, X chromosome, or unplaced contigs. Multiple chromosomes can be specified, e.g. --chrToSkip chrM chrX.

--binSize, -bs

Size of the bins, in bases, to calculate coverage. (Default: 10000)

Read Processing Options#

--minMappingQuality

If set, only reads that have a mapping quality score of at least this are considered.

--samFlagInclude

Include reads based on SAM flag. For example, to get only reads that are the first mate, use a flag of 64. This is useful to count properly paired reads only once, as otherwise the second mate will be also considered for the coverage. This argument can be used more than once in a command. (Default: None)

--samFlagExclude

Exclude reads based on the SAM flag. For example, to get only reads that map to the forward strand, use --samFlagExclude 16, where 16 is the SAM flag for reads that map to the reverse strand. This argument can be used more than once in a command. (Default: None)

--minFragmentLength

The minimum fragment length needed for read/pair inclusion. This option is for useful in ATACseq experiments, for filtering mono- or di-nucleosome fragments. (Default: 0)

--maxFragmentLength

The maximum fragment length accepted for read/pair inclusion. (Default: 0)

Read Filtering Options#

--duplicateFilter

Possible choices: start_bc, start_bc_umi, start_end_bc, start_end_bc_umi

How to filter for duplicates? Different combinations (using start/end/umi) are possible. Read start position and read barcode are always considered. Default (None) considers all reads as passing the filter. Note that in case of paired end data, both reads in the fragment are considered (and kept). So if you wish to keep only read1, combine this option with --samFlagInclude.

Optional arguments#

--numberOfSamples, -n

The number of bins that are sampled from the genome, for which the overlapping number of reads is computed. (Default: 100000.0)

--skipZeros

If set, then regions with zero overlapping readsfor all given BAM files are ignored. This will result in a reduced number of read counts than that specified in --numberOfSamples

Other options#

--verbose, -v

Set to see processing messages.

-V, --version

Print the program version and exit.