Glossary

Bisulfite sequencing is a biochemical method for determining DNA methylation in epigenetic studies. DNA methylation occurs in epigenetics for the inactivation of genes. In eukrayotes, methylation mainly affects cytosines at the C5 position in CpG dinucleotides. By treating DNA with bisulfite, unmethylated cytosine is converted to uracil, while 5-methylcytosine does not react with bisulfite. Uracil is recognized in PCR and DNA sequencing like thymine and the complementary strand is filled with adenine. Unmethylated sites thus show a C to U transition and can be identified by this in the sequencing reaction.

In some experimental setups, an entire flow cell may be oversized. Therefore, we offer for standard runs the possibility to combine your ready-to-load pools with other samples in one run. An early announcement of the indices used allows us to quickly schedule your samples into appropriate runs. We offer the NovaSeq S1, S2 and S4 Flow Cell with either 2x100 or 2x150 cycles as combinational runs (index read 8 or 10 bp).

Coverage in Next-generation sequencing (NGS) describes the average number of reads that align to ("cover”) known reference bases. The sequencing coverage often determines whether a variant can be called with a certain degree of confidence. Sequencing coverage is therefore depending on the respective application.

Coverage recommendations:

WGS
While a coverage (depth) of 30x is often sufficient in human genetics, the detection of somatic mutations, and hence small clones as well, is of great importance in tumor biology. Therefore, sequencing is usually performed with a coverage of 60-90x for the tumor material, while sequencing of the normal controls is performed with 30x coverage.

WES
Generally, a coverage (depth) of >100x is striven for during WES, as the detection of somatic mutations, and hence small clones as well, is of great importance in tumor biology.

Targeted/Panel Seq
In order to detect even small tumor clones, the Panel Seq aims for a high coverage (depth). The coverage usually exceeds 1,500x, with a minimum coverage of 400x to achieve a detection limit of 3%.

RNA Seq
In order to achieve sufficient accuracy during the transcriptome analysis, the target is 50 million reads per sample (sequenced fragments) for total RNA sequencing and 30-35 million reads for RNA-exome.

Description

This Glossary will help you to fully understand the content of our website, even if you are not an expert on the topic of sequencing yourself.

You will notice that everywhere on this website there are underlined terms that you can click on to immediately get an in depth explanation in this glossary.

During any NGS Library preparation unique barcode sequences (=indexes) are added to each sample, allowing multiple libraries to be pooled and sequenced together. After the sequencing this information is used to unequivocally assign the sequenced reads to the individual samples.

For instruments that use a patterned flow cell, such as the NovaSeq, the probability of an incorrect assignment of libraries from the expected index to a divergent index (=index hopping) is slightly elevated and the use of unique dual indexes is recommended, to exclude hopped reads from any downstream analysis. In order to eliminate the amplification bias introduced by PCR-based libraries, unique molecular identifiers (UMI) can be added during library preparation to identify the original DNA molecule and eliminate PCR artifacts later on (=error correction). The use of UMIs is primarily recommended for the detection of variants with an allele frequency <1%.

In addition to the fragmentation of the DNA, end repair, and adapter ligation, which contain unique indexes/barcodes such that each individual read after the sequencing can be uniquely identified as belonging to a patient, library preparation for whole exome sequencing (WES) also involves the enrichment of the coding sequences. Using probes, which exhibit a sequence complementary to the complete coding region (exome) can be specifically selected (capturing) and enriched. There are two types of DNA fragmentation, enzymatic and mechanical fragmentation. While the TruSeq Library Prep (Illumina) uses mechanical fragmentation, Illumina DNA Prep uses a enzymatic based fragmentation. At MLLSEQ, library prep is performed in a fully automated procedure by pipetting robots (Hamilton NGS Star). This ensures standardized and homogeneous library prep.

Molecular genetics combines a variety of different methods using either genomic DNA or cellular RNA (reverse transcribed into cDNA) as template. The spectrum of analyses ranges from the isolation of white blood cells and extraction of nucleic acids as sample preparation to PCR, quantitative PCR, digital PCR, Next generation sequencing (NGS), fragment length analysis, clonality detection and chimerism. Each of these methods comprises a variety of specific assays. Through panel diagnostics, a broad portfolio of gene mutations can be examined in parallel using a single approach, whereas highly sensitive methods such as quantitative PCR achieve detection limits of 10-5.

Usually, a "tumor-normal comparison" is performed in large scale genomic sequencing assays. By sequencing the tumor and e.g. peripheral blood as a normal control, the genome of a person can be compared for both materials, thereby allowing the differences in the tumor to be identified and benign polymorphisms to be eliminated.In hematology, this is where we are faced with a huge challenge, as the frequently-used peripheral blood from patients with hematological neoplasia already contains the "tumor," namely the leukemia cells, which means that it is not an option for an easily available normal control. For this reason, buccal swap or fingernails are usually used in the field of hematology. However, it can be challenging to isolate sufficient DNA for sequencing from these materials. Alternatively, sorted T cells of the patient might be an option.If no normal material is available, you will have to find other solutions. Hence, we use a "tumor-unmatched normal" workflow in order to eliminate artifacts and a percentage of the polymorphisms. This involves utilizing sequences of healthy controls from other persons.

Human Myeloid Panel (MLL)
ASXL1, APC, ASXL2, ATM, ATRX, BCOR, BCORL1, BRAF, BRCC3, CALR, CBL, CDH23, CDKN2A, CEBPA, CREBBP, CSF3R, CSNK1A1, CTCF, CUX1, DDX41, DDX54, DHX29, DNMT3A, EP300, ETNK1, ETV6, EZH2, FANCL, FBXW7, FLT3, GATA1, GATA2, GNAS, GNB1, IDH1, IDH2, JAK2, KDM5A, KDM6A, KIT, KMT2D, KRAS, MPL, MYC, NF1, NOTCH1, NPM1, NRAS, PHF6, PIGA, PPM1D, PRPF8, PTPN11, RAD21, RB1, RUNX1, SETBP1, SF1, SF3A1, SF3B1, SH2B3, SMC1A, SMC3, SRSF2, STAG2, SUZ12, TET2, TP53, U2AF1, U2AF2, WT1, ZBTB7A, ZRSR2

Human Lymphoid Panel (MLL)
ARID1A, ATM, BCL2, BIRC3, BRAF, BTK, CARD11, CCND1, CD79A, CD79B, CHEK2, CREBBP, CXCR4, DDX3X, DIS3, DNMT3A, EP300, EZH2, FAM46C, FAS, FAT4, FBXW10, FBXW7, FOXO1, GPR98, ID3, IKBKB, IL2RG, JAK1, JAK3, KLF2, KLHL6, KMT2D, KRAS, LRP1B, MAP2K1, MAPK1, MEF2B, MYBBP1A, MYD88, NFKBIE, NOTCH1, NOTCH2, NRAS, PHF6, PLCG2, POT1, PTPRD, RPS15, RUNX1, SF3B1, STAT3, STAT5B, TBL1XR1, TCF3, TET2, TLR2, TNFAIP3, TNFRSF14, TP53, TRAF3, UBR5, WHSC1, XPO1, ZMYM3

Human Erythrocytosis Panel (MLL)
BHLHE41, BPGM, EGLN1, EGLN2, EGLN3, EPAS1, EPO, EPOR, GFI1B, HBA1, HBA2, HBB, HIF1A, HIF1AN, HIF3A, JAK2, KDM6A, OS9, SH2B3, VHL, ZNF197

Human Cardio Panel (MLL)
ABL1, ASXL1, ATRX, BCOR, BCORL1, BRAF, CALR, CBL, CBLB, CBLC, CDKN2A, CEBPA, CSF3R, CUX1, DNMT3A, ETV6, EZH2, FBXW7, FLT3, GATA1, GATA2, GNAS, GNB1, HRAS, IDH1, IDH2, IKZF1, JAK2, JAK3, KDM6A, KIT, KRAS, KMT2A, MPL, MYD88, NOTCH1, NPM1, NRAS, PDGFRA, PHF6, PPM1D, PTEN, PTPN11, RAD21, RUNX1, SETBP1, SF3B1, SMC1A, SMC3, SRSF2, STAG2, TET2, TP53, U2AF1, WT1, ZRSR2

Human Exome Panel
xGen Exome Research Panel v2 (IDT Integrated DNA Technologies)

Custom Panel
We also offer custom panels (xGen Lockdown Panels) in cooperation with IDT Integrated DNA Technologies. We offer sequencing of tailored panels from a sample number of 96. After indication of the target regions (chromosomal coordinates) a preparation time of 4-6 weeks is required (panel design, production and wet lab validation at MLLSEQ).

The flexibility and sequencing experience we are offering would not be possible without such strong partners at our side, that' s why we would like to express our maximum gratitude to them for their support.

Illumina

IDT
Integrated DNA technologies

AWS
Amazon Web Services

We have used their brand names for their products on our homepage and hopefully they are represented correctly in all instances. If not, we apologize and refer to the homepages of the respective partners.

Human ID Panel (MLL)
The human ID panel contains 24 SNPs that enable the unique identification of individual samples. Every panel designed at the MLL contains automatically the human ID panel.

Copy number variation (CNV) Panel (IDT)
The CNV panel consists of 9115 individual probes spaced approximately every 0.34 Mb across the human genome.

Additional genes
If the available panels do not contain all the desired genes, it is also possible to mix the probes of individual genes to a panel in order to cover all desired regions. After indication of the additional target regions (chromosomal coordinates) a preparation time of 4 weeks is required (probe design, production at IDT and wet lab validation at MLLSEQ).

The FASTQ files are the input for the subsequent read alignment to the reference genome or, in the case of WTS, the reads can also be matched to their position on the reference transcriptome. The assembly of the human reference genome has evolved over time and for backward compatibility we align against GRCh37/hg19. The alignment process assigns each sequenced DNA fragment to its matching region in the human genome based on its base sequence. The position of the reads is stored as a sequence alignment/map (SAM) or binary alignment/map (BAM) file. Read alignment is a complex and computationally very intensive part of the pre-processing workflow that can be significantly accelerated by parallelisation. Hence, like most of the pre-processing steps, the alignment is performed in our private AWS instance of Amazon Cloud in Frankfurt (AWS, Amazon Web Services). DNA sequencing data (WGS, WES, gene panels) is aligned with the Isaac Aligner and for WTS data the STAR aligner is used.

Sequencing data is often sensible data that has to be protected by the highest security standards. Raw sequencing data from the NovaSeq system is directly streamed into a private AWS instance of Amazon Cloud in Frankfurt (AWS, Amazon Web Services), to which only selected employees at MLL have access. The data is completely anonymized with an arbitrary internal identifier and no personal or clinical data is stored in the cloud. The data security measures comply with the highest standards of the new EU General Data Protection Regulation (GDPR), which has also been verified by external auditors in their reports, including ISO 27001, ISO 27017 and ISO 27018. Furthermore, AWS has also been awarded the C5 attestation of the Federal Office of Information Security. Raw sequencing data from the MiSeq systems is stored locally without external access.

A key step of any NGS library preparation is the addition of unique barcode sequences (= indexing) per sample that allow multiple libraries to be pooled and sequenced together. After the sequencing the index information is used to unequivocally assign the sequenced fragments (= reads) to the individual patients, automatically (bcl2fastq software) generating patient-specific FASTQ files. Converting raw sequencing data of a multiplexed run into sample-specific FASTQ files is called ‘demultiplexing’. In order to account for the known phenomenon of index hopping (=incorrect assignment of libraries from the expected index to a divergent index) it is recommended to use unique dual indexing pooling combinations to eliminate hopped reads from downstream analysis.

The alignment result is used to identify deviating positions (=variants) from the reference genome, producing a list of variant calls detailed in a variant call format (VCF) file.

SNV (Single Nucleotide Variant):
Individual base exchanges, as well as smaller insertions and deletions can be detected. For larger assays such as WGS or WES it is necessary to rely on matched tumour-normal variant calling (Strelka2) to reduce false positive variant calls and to reliably distinguish somatic variants from germline variant calls.

The sensitivity of WGS with 100x coverage is about 10-15% mutation load. For WES with a 250x coverage a sensitivity of 10% is reached. A tumor-only workflow (Pisces) is applied for gene panels but a specific post-screening of germline material might still be necessary to validate potential somatic variants. Gene panels are routinely sequenced with a target coverage of 1500x, allowing a sensitivity of >2% mutation load. Large deletions and medium-sized insertions, as they are for example found in CALR and FLT3, are called with Pindel.

CNV (Copy Number Variant) and SV (Structural Variant):
For WGS the copy number variants as well as structural variants can be assessed. CNVs are called with GATK4 and SV with Manta.

Fusion calling:
Fusion calling for RNA Seq data is performed with Arriba, Manta and STAR-Fusion. Additionally Isaac Variant caller is used for SNV and small indel detection. For fusion detection paired-end reads are required.

Differential gene expression:
To perform differential expression analysis to reference genes edgeR is used. For this approach control samples are needed as reference.

In order to facilitate the interpretation of identified variants, additional information about the detected variants can be provided. This includes the identification of the gene that overlaps with the variant, a precise characterization of the genomic region (exon, intron, intron-exon transition) in which the variant was found, a translation of the variant into a standardized nomenclature, an estimation of the possible functional effect of the found variant (missense, synonymous, polymorphism, etc.), and, for example, the population frequency as reported by gnomAD.

The MLL routinely documents the evaluation of discovered sequence variants and, hence, in addition to clinical databases the in-house database can be assessed to estimate the clinical relevance for a multitude of variants. The annotation of vcf files can be done either automatized - based on public data bases only, using Nirvana Annotation Engine and the following sources: VEP, ClinVar, COSMIC, dbSNP, gnomAD, DGV - or manually, using the MLL routine diagnostics workflow with variant classification for a defined set of genes.

In addition to the fragmentation of the DNA, end repair, and adapter ligation, which contain unique indexes/barcodes such that each individual read after the sequencing can be uniquely identified as belonging to a patient, library preparation for targeted Panel Seq also involves the enrichment of the coding sequences. Using probes, which exhibit a sequence complementary to the region of interest (panel of genes or the complete coding regions, exome) can be specifically selected (capturing) and enriched. There are two types of DNA fragmentation, enzymatic and mechanical fragmentation. While the TruSeq Library Prep (Illumina) uses mechanical fragmentation, Illumina DNA Prep uses a enzymatic based fragmentation. At MLLSEQ, library prep is performed in a fully automated procedure by pipetting robots (Hamilton NGS Star). This ensures standardized and homogeneous library prep.

In addition to the fragmentation of the DNA, end repair, and adapter ligation, which contain unique indexes/barcodes such that each individual read after the sequencing can be uniquely identified as belonging to a patient, library preparation for whole exome sequencing (WES) also involves the enrichment of the coding sequences. Using probes, which exhibit a sequence complementary to the complete coding region (exome) can be specifically selected (capturing) and enriched. There are two types of DNA fragmentation, enzymatic and mechanical fragmentation. While the TruSeq Library Prep (Illumina) uses mechanical fragmentation, Illumina DNA Prep uses a enzymatic based fragmentation. At MLLSEQ, library prep is performed in a fully automated procedure by pipetting robots (Hamilton NGS Star). This ensures standardized and homogeneous library prep.

There are two fundamentally different approaches for library preparation for WGS: PCR-free and DNA amplification.

For the PCR-free method, a relatively large amount of input DNA is required (1ug), but it avoids PCR artifacts. Generally, sufficient DNA for a PCR-free library prep can be obtained from bone marrow and peripheral blood.

If the raw material exists in the form of fixed tissue (formalin-fixed, paraffin-embedded; FFPE) or as cell-free DNA from liquid biopsy samples, a pre-amplification method must be chosen in order to obtain sufficient material for the sequencing. Library prep includes the fragmentation of the DNA, end repair, and adapter ligation, which contain unique indexes/barcodes such that each individual read after the sequencing can be uniquely identified as belonging to a patient. At MLLSEQ, library prep is performed in a fully automated procedure by pipetting robots (Hamilton NGS Star). This ensures standardized and homogeneous library prep.

As with the analysis of DNA (WGS, WES, Targeted Panel Seq), library preparation is conducted prior to the sequencing of the transcriptome. This process includes the fragmentation of the RNA, the removal of ribosomal RNA, the synthesis of cDNA from the RNA, the ligation of uniquely identifiable indexes/barcodes that make it possible to tell one sample apart from another, and a subsequent enrichment of the material via PCR.

At MLLSEQ, library prep is performed in a fully automated procedure by pipetting robots (Hamilton NGS Star). This ensures standardized and homogeneous library prep.