sincei.Utilities module#
- sincei.Utilities.checkBAMtag(bam, name, tag)[source]#
Check whether a given tag is present in the BAM file.
- sincei.Utilities.checkMotifs(read, chrom, genome, readMotif, refMotif)[source]#
Check whether a given motif is present in the read and the corresponding reference genome. For example, in MNAse (scChIC-seq) data, we expect the reads to have an 'A' at the 5'-end, while the genome has a 'TA' over hang (where the 'A' aligns with 'A' in the forward read), like this below.
Forwards aligned read: read has 'A', upstream has T
R1 ........A-------> ----------TA------------Ref (+)
Rev aligned read: read has 'T', downstream has A
<-------T....... R1 --------TA------------Ref (+)
This function can look for any arbitrary motif in read and corresponding genome, but in the same orientation as described above.
Returns#
- bool
True if the motif is present in the read and reference genome.
Examples#
>>> import pysam >>> import os >>> from scDeepTools.scReadCounter import CountReadsPerBin as cr >>> root = os.path.dirname(os.path.abspath(__file__)) + "/test/test_data/" >>> bam = pysam.AlignmentFile("{}/test_TA_filtering.bam".format(root)) >>> iter = bam.fetch() >>> read = next(iter) >>> cr.checkMotifs(read, 'A', 'TA') # for valid scChIC read True >>> read = next(iter) >>> cr.checkMotifs(read, 'A', 'TA') # for invalid scChIC read False
- sincei.Utilities.checkGCcontent(read, lowFilter, highFilter, returnGC=False)[source]#
Checks if the GC content of the read is within the given range
Parameters#
- readpysam.AlignedSegment
A pysam AlignedSegment object
- lowFilterfloat
Minimum GC content
- highFilterfloat
Maximum GC content
- returnGCbool
If true, return the GC content of the read
Returns#
- bool
True if the GC content of the read is within the given range
Examples#
>>> test = Tester() >>> read = test.bamFile1.fetch().next() >>> checkGCcontent(read, 0.3, 0.7) True
- sincei.Utilities.checkAlignedFraction(read, lowFilter)[source]#
Check whether the fraction of read length that aligns to the reference is higher than the given threshold. Aligned fraction includes the max allowed mismatches tolerated by the aligner, and excludes InDels and Clippings.
Returns#
- bool
True if the fraction of read length aligns to the reference.
- sincei.Utilities.colorPicker(name)[source]#
This function returns a list of colors for plotting.
Parameters#
- namestr
The name of the color palette to use.
Returns#
- list
A list of colors.
Examples#
>>> colorPicker('twentyfive') ['#e41a1c', '#377eb8', '#4daf4a', '#984ea3', '#ff7f00', '#ffff33', '#a65628', '#f781bf', '#999999']
>>> colorPicker('colorblind') ['#0072B2', '#009E73', '#D55E00', '#CC79A7', '#F0E442', '#56B4E9', '#E69F00', '#000000']
- sincei.Utilities.getDupFilterTuple(read, bc, filterArg)[source]#
Returns a tuple with the information needed to filter duplicates, based on read and filter type. The tuple is composed of the barcode, the umi, the start and end positions and the chromosome name.
Parameters#
- readpysam.AlignedSegment
A pysam.AlignedSegment object.
- bcstr
The barcode.
- filterstr
A string with the type of filter to use.
Returns#
- tuple
A tuple with the information needed to filter duplicates. The tuple is composed of the barcode, the umi, the start and end positions and the chromosome name.
Examples#
>>> test = Tester() >>> read = test.bamFile1.fetch().next() >>> getDupFilterTuple(read, 'ATCG', 'end_umi') ('ATCG', 'ATCG', None, None, 0, False)
- sincei.Utilities.gini(i, X)[source]#
Computes the Gini coefficient for each row of a sparse matrix (Obs*Var).
Parameters#
- iint
row index
- Xnumpy array
matrix
Returns#
- float
Gini coefficient for the given row
Examples#
>>> X = np.matrix([[1,2,3,4],[5,6,7,8],[9,10,11,12]]) >>> gini(0, X) 0.0 >>> gini(1, X) 0.0 >>> gini(2, X) 0.0