scFilterBarcodes

This tool identifies barcodes present in a BAM files and produces a list. You can optionally filter these barcodes by matching them to a whitelist or based on total counts.

usage: Example usage: scFilterBarcodes -b sample.bam -w whitelist.txt > barcodes_detected.txt

Input/Output options

--bamfile, -b

Indexed BAM file

--whitelist, -w

A single-column file containing the whitelist of barcodes to be used

--outFile, -o

The file to write results to. For scFilterStats, scFilterBarcodes and scJSD, the output file is a .txt file. For other tools, the output file is an updated .loom object with the result of the requested operation.

Counting Options

--minHammingDist, -d

Minimum hamming distance to match the barcode in whitelist. Note that increasing the hamming distance really slows down the barcode detection process.

--minCount, -mc

Minimum no. of bins with non-zero counts, in order to report a barcode. Note that this number would range from 0 to genome size/binSize.

--minMappingQuality, -mq

If set, only reads that have a mapping quality score of at least this are considered.

--rankPlot, -rp

The output file name to plot the ranked counts per barcode (similar to the “knee plot”,but counts in this case would be the number of non-zero bins)

BAM processing options

--cellTag, -ct

Name of the BAM tag from which to extract barcodes.

--groupTag, -gt

In case of a groupped BAM file, such as the one containing Read Group (RG) or Sample (SM) tag,it is possible to process group the reads using the provided –groupTag argument. NOTE: In case of such input, please ensure that the –labels argument indicates the expected group labels contained in the BAM files. The –groupTag along with the –cellTag is then used to identify unique samples (cells) from the input.

--numberOfProcessors, -p

Number of processors to use. Type “max/2” to use half the maximum number of processors or “max” to use all available processors. (Default: 1)

--blackListFileName, -bl

A BED or GTF file containing regions that should be excluded from all analyses. Currently this works by rejecting genomic chunks that happen to overlap an entry. Consequently, for BAM files, if a read partially overlaps a blacklisted region or a fragment spans over it, then the read/fragment might still be considered. Please note that you should adjust the effective genome size, if relevant.

--binSize, -bs

Size of the bins, in bases, to calculate coverage (Default: 100000)

Other options

--verbose, -v

Set to see processing messages.

--version

show program’s version number and exit