bioinformatics
Summary
In the last couple of years biological data generation expanded rapidly. Particularly, in the field of sequencing with the emerging of the Next Generation Sequencing (NGS) machines.
As a result, the cost of sequencing a human genome is getting cheaper every year and genomics is outpacing developments in computing as measured by Moore’s law – the notion that computers double in processing capability per dollar spent every 18-24 months.
However the analysis of these large datasets is getting more complicated. In our bioinformatics workflows we use state-of-the-art technologies and the most updated, also trustworthy, databases to interpret our genomics data.
Source: https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost
Our Services/Benefits
Major bioinformatics task in the field of genomics:
- Quality control of NGS sequencing data
- Reference mapping
- Variant identification
- Variant annotation
- Pathogenicity prediction of genetic variants
- De novo assembly
- RNA quantification
- Gene expression analysis
Sequencing
The Illumina next-generation sequencing (NGS) method is based on sequencing-by-synthesis, and reversible dye-terminators that enable the identification of single bases as they are introduced into DNA strands. Binary Base Call (BCL) files are the raw data files generated by the Illumina sequencers.
Fastq generation
FASTQ is a text-based sequencing data file format that stores both raw sequence data and quality scores (FASTQ).
Adapter trimming, quality filtering
Sequences corresponding to the library adapters can be present in the FASTQ files and should be removed from reads because they interfere with downstream analyses, such as alignment of reads to a reference. FastQC aims to provide a simple way to do quality control checks on raw sequence data coming from high throughput sequencing pipelines (trimmed FASTQ, FASTQC).
Reference mapping
The graph-based alignment method uses alt-aware mapping for population haplotypes stitched into the reference with known alignments to establish alternate graph paths that reads could seed-map and align to. A BAM file is the compressed binary version of a SAM file that is used to represent aligned sequences (BAM).
Variant calling
The DRAGEN Variant Caller takes mapped and aligned DNA reads as input and calls SNPs and indels through a combination of column-wise detection and local de novo assembly of haplotypes. VCF is a text file format that contains information about variants found at specific positions in a reference genome (VCF).
Variant annotation
Nirvana provides clinical-grade annotation of genomic variants and it is being developed under a rigorous testing process to ensure accuracy of the results and enable embedding in other software with regulatory needs. VarSome Premium is a CE IVD-certified and HIPAA-compliant platform allowing fast and accurate variant discovery, annotation, and interpretation of NGS data (final report).