Data Analysis

What we offer

Fast data processing
All data processing steps are performed in a highly parallel computing environment, allowing for basically unlimited scalability and, hence, speedy performance.

Comprehensive Data Analysis
We offer a broad range of data processing workflows. To what degree you request our bioinformatics support is up to you. We deliver fastq, bam or vcf files.

Advanced data interpretation
We can provide data annotation based on publicly available databases or manually curated Data Analysis, based on our carefully maintained in-house database.

Customizable data protection
We store the data for 3 months and can also work out an individual storage plan. We are aware of the importance and sensitivity of your sequencing data. Therefore we protect them according to the highest Data security standards.

Our process

Nowadays generating data is the easy part, analyzing and interpreting the obtained data sets is the real challenge. We adapted existing pipelines to optimally analyze the data output of different assays. We have access to carefully maintained in-house databases that, for example, allow for advanced Variant interpretation for targeted sequencing.

Demultiplexing FASTQ files

During any NGS Library Preparation unique Barcode sequences are added to each sample, allowing multiple libraries to be pooled and sequenced together. After the sequencing this information is used to unequivocally assign the sequenced reads to the individual samples (=Demultiplexing), automatically generating sample-specific FASTQ files.

Alignment BAM file

The FASTQ files are the input for the subsequent read Alignment to the reference genome or transcriptome. The Alignment process assigns each sequenced DNA fragment to its matching region in the human genome/transcriptome based on its base sequence. The position of the reads is stored as a sequence Alignment/map (SAM) or binary Alignment/map (BAM) file.

Variant calling VCF file

Variant calling
The Alignment result is used to identify deviating positions from the reference genome, producing a list of variant calls, detailed in a variant call format (VCF) file. Single nucleotide variants (SNVs), as well as smaller insertions and deletions can be detected. For larger assays (WGS, WES) the copy number variants (CNV) and structural variants (SV) can also be assessed.

Raw counts - txt file
For transcriptome data we either provide the raw gene counts based on the Alignment results or the transcript counts based on pseudo-Alignment algorithms.

Fusion calling results
Three different fusion callers are used to identify potential fusion transcripts from transcriptome data. Identified fusion transcripts can be annotated with public databases to provide additional information about the transcript.

Interpretation Report file

In order to facilitate Variant interpretation, additional information about the detected variants can be provided. The MLL routinely documents the evaluation of discovered sequence variants and, hence, in addition to clinical databases the in-house database can be assessed to estimate the clinical relevance.

Data Security

Raw sequencing data from the NovaSeq system is directly streamed into a private AWS instance of Amazon Cloud in Frankfurt with restricted access. The data is completely anonymized and no personal or clinical data is stored in the cloud. The Data security measures comply with the highest standards of the new EU General Data Protection Regulation (GDPR), which has also been verified by external auditors. Raw sequencing data from the MiSeq systems is stored locally without external access.

Contact us