Analysis of 10x genomics multiome data using sincei
Below, we will demonstrate how we can use sincei to explore the scRNA-seq and scATAC-seq data as part of the 10x multiome protocol. The 10x multiome kit allows joint profiling of single-cell ATAC-seq and RNA-seq from single-cells. Here, we will analyse these two data sets separately. We will will use the dataset published with Persad et. al. (2023), which profiles CD34+ cells from human bone marrow.
1. Download and process the dataset
The raw fastq files were downloaded from GEO and processed using the standard 10x genomics cellranger-arc workflow. Below is the structure of the output directory from the workflow:
<output_di>/outs:
├── analysis
├── atac_cut_sites.bigwig
├── atac_fragments.tsv.gz
├── atac_fragments.tsv.gz.tbi
├── atac_peak_annotation.tsv
├── atac_peaks.bed
├── atac_possorted_bam.bam
├── atac_possorted_bam.bam.bai
├── cloupe.cloupe
├── filtered_feature_bc_matrix
├── filtered_feature_bc_matrix.h5
├── gex_molecule_info.h5
├── gex_possorted_bam.bam
├── gex_possorted_bam.bam.bai
├── per_barcode_metrics.csv
├── raw_feature_bc_matrix
├── raw_feature_bc_matrix.h5
├── summary.csv
└── web_summary.html
We will use the gex_possorted_bam.bam
for gene-expression data and
atac_possorted_bam.bam
for chromatin accessibility analysis using
sincei. These files can also be produced as part of the
cellranger count
workflow for scRNA-seq or scATAC-seq data alone.
For convenience, we provide a subset of this data (only chromosome 2)
here
mkdir 10x_multiome && wget -O 10x_multiome/10x_multiome_testdata.tar.gz https://figshare.com/ndownloader/files/41303289
tar -xvzf 10x_multiome/10x_multiome_testdata.tar.gz ## releases 7 files
(optional) pre-filtering of barcodes
Most of the cell barcodes from the droplet-based protocols (like 10x
genomics) do not contain cells. Therefore they have very low counts.
These must be filtered away at the beginning of the analysis. Although
the cellranger pipeline already provides a list of filtered barcodes,
sincei also allows you to extract per barcode count distributions,
indicating which barcodes should be removed. This can be done using the
scFilerBarcodes
tool.
barcodes=737K-arc-v1.txt # cellranger-arc barcodes in this case
for r in 1 2
do
bamfile=cellranger_output_rep${r}/outs/atac_possorted_bam.bam
scFilterBarcodes -p 20 -b ${bamfile} -w ${barcodes} \
-o sincei_output/atac_barcodes_rep${r}.tsv \
--minCount 100 --minMappingQuality 10 --cellTag CB \
--rankPlot sincei_output/barcode_rankplot_rep${r}.png
done
The above example uses a whitelist of possible ATAC barcodes from
cellrange-arc
workflow. See
here
for more details. Providing a whitelist is optional in general, but
recommended for 10x genomics data.
The output file contains a list of filtered barcodes that contain counts
in atleast -mc
regions of the genome. Unlike other tools with
similar options, sincei splits the data in 100kb bins and reports
whether or not a barcode has signal in those bins. This way, barcodes
with high counts, but present in only one genomic bin can also be
filtered out. In most cases, the output is same as the usual approach of
filtering by total counts. -rp
produces the familiar knee-plot
of the barcode counts.
2. scATAC-seq analysis
Please follow this tutorial for further analysis of scATAC-seq samples from the above data.
3. scRNA-seq analysis
Please follow :doc: this tutorial <sincei_tutorial_10xATAC> for further analysis of scRNA-seq samples from the above data.
4. Joint analysis
Tutorial in preparation.