sincei.Utilities module#

sincei.Utilities.checkBAMtag(bam, name, tag)[source]#

Check whether a given tag is present in the BAM file.

sincei.Utilities.checkMotifs(read, chrom, genome, readMotif, refMotif)[source]#

Check whether a given motif is present in the read and the corresponding reference genome. For example, in MNAse (scChIC-seq) data, we expect the reads to have an 'A' at the 5'-end, while the genome has a 'TA' over hang (where the 'A' aligns with 'A' in the forward read), like this below.

Forwards aligned read: read has 'A', upstream has T

R1 ........A-------> ----------TA------------Ref (+)

Rev aligned read: read has 'T', downstream has A

<-------T....... R1 --------TA------------Ref (+)

This function can look for any arbitrary motif in read and corresponding genome, but in the same orientation as described above.

Returns#

bool

True if the motif is present in the read and reference genome.

Examples#

>>> import pysam
>>> import os
>>> from scDeepTools.scReadCounter import CountReadsPerBin as cr
>>> root = os.path.dirname(os.path.abspath(__file__)) + "/test/test_data/"
>>> bam = pysam.AlignmentFile("{}/test_TA_filtering.bam".format(root))
>>> iter = bam.fetch()
>>> read = next(iter)
>>> cr.checkMotifs(read, 'A', 'TA') # for valid scChIC read
True
>>> read = next(iter)
>>> cr.checkMotifs(read, 'A', 'TA') # for invalid scChIC read
False
sincei.Utilities.checkGCcontent(read, lowFilter, highFilter, returnGC=False)[source]#

Checks if the GC content of the read is within the given range

Parameters#

readpysam.AlignedSegment

A pysam AlignedSegment object

lowFilterfloat

Minimum GC content

highFilterfloat

Maximum GC content

returnGCbool

If true, return the GC content of the read

Returns#

bool

True if the GC content of the read is within the given range

Examples#

>>> test = Tester()
>>> read = test.bamFile1.fetch().next()
>>> checkGCcontent(read, 0.3, 0.7)
True
sincei.Utilities.checkAlignedFraction(read, lowFilter)[source]#

Check whether the fraction of read length that aligns to the reference is higher than the given threshold. Aligned fraction includes the max allowed mismatches tolerated by the aligner, and excludes InDels and Clippings.

Returns#

bool

True if the fraction of read length aligns to the reference.

sincei.Utilities.colorPicker(name)[source]#

This function returns a list of colors for plotting.

Parameters#

namestr

The name of the color palette to use.

Returns#

list

A list of colors.

Examples#

>>> colorPicker('twentyfive')
['#e41a1c', '#377eb8', '#4daf4a', '#984ea3', '#ff7f00', '#ffff33', '#a65628', '#f781bf', '#999999']
>>> colorPicker('colorblind')
['#0072B2', '#009E73', '#D55E00', '#CC79A7', '#F0E442', '#56B4E9', '#E69F00', '#000000']
sincei.Utilities.getDupFilterTuple(read, bc, filterArg)[source]#

Returns a tuple with the information needed to filter duplicates, based on read and filter type. The tuple is composed of the barcode, the umi, the start and end positions and the chromosome name.

Parameters#

readpysam.AlignedSegment

A pysam.AlignedSegment object.

bcstr

The barcode.

filterstr

A string with the type of filter to use.

Returns#

tuple

A tuple with the information needed to filter duplicates. The tuple is composed of the barcode, the umi, the start and end positions and the chromosome name.

Examples#

>>> test = Tester()
>>> read = test.bamFile1.fetch().next()
>>> getDupFilterTuple(read, 'ATCG', 'end_umi')
('ATCG', 'ATCG', None, None, 0, False)
sincei.Utilities.gini(i, X)[source]#

Computes the Gini coefficient for each row of a sparse matrix (Obs*Var).

Parameters#

iint

row index

Xnumpy array

matrix

Returns#

float

Gini coefficient for the given row

Examples#

>>> X = np.matrix([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
>>> gini(0, X)
0.0
>>> gini(1, X)
0.0
>>> gini(2, X)
0.0