FastQ files containing the raw RNA-Seq reads output from the sequencers are first checked for quality using FastQC.
It is followed by trimming of poor quality reads using any of the following tools:
- Trimmomatic
- BBDuk
- cutadapt
An aligner is used to align the reads to the reference genome (Human/Preclinical Model, etc) using any of the following tools:
The aligned Sequence Alignment Map (SAM) files are sorted and converted to BAM using samtools in some cases.
Next, the Binary Alignment Map (BAM) files are checked for their quality using either of the following tools:
- Picard RNASeqmetrics
- RSeQC
In cases for fusion gene detection, we specifically use either the Tophat-fusion or STAR-Fusion tool. Splicing events are identified using SGSe tool part of the biocondutor package in R.
Quantification of reads to Transcripts per million (TPM) is calculated using RSEM.
For cases of high sequence duplication, we tag the duplicates using Picard MarkDuplicates and count the duplicated reads using featureCounts. In many cases of overduplication we also remove the duplicated reads using Picard MarkDuplicates.
For raw unnormalized counts, we use either of the following tools:
- Htseq Counts
- featureCounts
For Differential Gene expression, we use either the following packages in R:
The final data can be represented in the form of heatmaps, volcano plots, MA plots and PCA plots using basic R packages.