scScoreFeatures#
scScoreFeatures computes gene activity scores from chromatin data (use --GTF) or aggregates
binned chromatin data into Variable Chromatin Regions (use --VCR with output from scFindVCRs).
usage:
scScoreFeatures -m aggregate -i INPUT_binned.h5ad --features VCRs.bed -o OUTPUT_aggregate.h5ad
scScoreFeatures -m activities -i INPUT_binned.h5ad --features genes.gtf -o OUTPUT_activities.h5ad
Input/Output options#
- --input, -i
sincei-generated input file in .h5ad format.
- --outFile, -o
The file to write results to. For scFilterStats, scFilterBarcodes and scJSD, the output file is a .tsv file. For other tools, the output file is an updated .h5ad file with the result of the requested operation.
Common Options#
- --mode, -m
Possible choices: aggregate, activities
The
activitiesmode calculates weighted gene activity scores per cell using exponential decay of signal around each input feature (such as gene-body, TSS, or region).aggregatemode calculates a simple sum of counts per cell for each input feature.- --features
Path to the BED or GTF file containing the features to use for aggregation/scoring.
- --overlapPolicy, -op
Possible choices: partial, all, none
- Policy for handling regions present in .h5ad input file that only partially overlap regions present in --features.
- Options are:
partial: count reads in anndata regions proportionally to the overlap fraction,read as counts_considered = feature_counts * overlap_length / region_length.
all: count all reads in the partially overlapping anndata regions.
none: exclude reads from partially overlapping anndata regions, in other words, onlycount reads in anndata regions fully contained within BED/GTF regions.
Default: 'partial'.
- --centerScores, -cs
If flag is set, center and scale the scores to unit variance and zero mean. Default: False.
- --numberOfProcessors, -p
Number of processors to use. Type "max/2" to use half the maximum number of processors or "max" to use all available processors. Default: "max".
Aggregate Mode Options#
- --penalty, -pen
Penalty value to determine which VCRs to use for aggregation. Used only when the input is a BED file created with
scFindVCRswith a range of penalties (stored in the 5th column). Default: None.
Activities Mode Options#
- --decay, -d
Decay parameter for calculating distance weights. Higher values lead to faster decay with distance. Weights are calculated as
exp(-decay * distance_in_kb / 10). Only used with--mode activities. Default: 0.75.- --maxRegion, -mr
Maximum region size (in kb) upstream and downstream of the genes to consider for activity calculation. Default: 100.
- --geneBody
Flag to indicate whether the entire gene body is weighted as 1 (like the TSS). If provided, decay starts beyond gene body. By default, the weight decay starts from TSS. Default: False.
- --normalizeGeneLengths
Flag to indicate whether to apply length normalization to the input genes. If provided, gene scores are normalized w.r.t. gene length in the input GTF/BED file. Default: False.
- --excludeInRange
Possible choices: TSS, genes
Exclude regions that overlap other features from contributing to activity score of the input genes. This could help avoid spurious correlations between the target genes and the neighboring genes (in particular for promoter-enriched signals, such as H3K4me3). Options are: 'TSS': exclude features overlapping the TSS of other genes. 'genes': exclude features overlapping the bodies of other genes. Default: None.
Other options#
- --verbose, -v
Set to see processing messages.
- -V, --version
Print the program version and exit.