Special Report on Sequencing: Qualifying statements

Special Report on Sequencing

Qualifying statements

Are expanding applications pushing next-gen sequencing beyond its limits?

By Randall C Willis

1977.

NASA’s new space shuttle makes its first test flight.

Fans line up for a little space-Western called Star Wars.

The King of Rock and Roll passes away in his home.

A future DDNews features editor starts high school.

And two revolutionary research papers are published, papers that in many ways will launch the field of genomics.

Four decades

Although nucleic acid sequencing was not invented in 1977, that year witnessed the publication of two papers that essentially made routine the science of DNA sequencing.

In February, Allan Maxam and William Gilbert described their use of chemical modification and cleavage to identify the nucleotide sequences of radiolabelled DNA fragments.

Ten months later, Frederick Sanger and colleagues described a completely different method whereby DNA polymerase incorporated radiolabelled nucleotides and chain-terminating dideoxy derivatives into a DNA complement of a template.

Widespread application of DNA sequencing led to speculation that researchers might one day sequence the entire human genome. But going from small DNA fragments of hundreds or thousands of base pairs to covering billions of base pairs demanded more of the technology, leading to automation by Applied Biosystems, genome-mapping efforts and dramatic throughput improvements over the next 25 years.

“In the 1990s, the idea of sequencing a human genome seemed daunting,” offered Eric Green, director of the U.S. National Human Genome Research Institute, and associates in a recent commentary on the future of DNA sequencing.

But, the authors suggested, researchers became voracious for genetic data: “Now, geneticists would like to have DNA sequences for everyone on Earth, and from every cell in every tissue at every developmental stage (including epigenetic modifications), in health and in disease.”

Although the authors acknowledged the dramatic evolution of sequencing technologies and platforms of the past decades, they see technological achievement becoming less of a driver of innovation. Rather, like smartphones, the Internet and digital photography, they argued that future evolution will be driven by efforts to expand the areas in which sequencing can be applied.

According to Laurence Ettwiller, head of bioinformatics and computational biology at New England Biolabs (NEB), this application diaspora is already well underway.

“I don’t think there is one community,” she says. “The sequencing community is actually sequencing communities.”

“We could do human genome SNP analysis. Another specific task would be cancer genome somatic variants; long-read sequences for other reasons,” she continues. “And all this is going to be more fragmented and more specialized, requiring specialized technologies as well as upstream library preparation.”

An example of this fragmentation into specialist groups is in nutrition research.

Steve Siembieda, vice president of commercialization at Advanced Analytical Technologies (AATI), recounts a talk given by Patrick Descombes, head of functional genomics at Nestlé Institute of Health Sciences (NIHS), at a Pacific Biosciences (PacBio) user group meeting in Korea.

“They’re investing in genomic sequencing because they want to be able to give you, as a consumer, the right food for your health,” he recalls. “Not someone else’s, but for you specifically.”

“If you know your genetic profile, they believe that someday they’ll be able to say that you need to eat yogurt and you need to eat carrots but no meat, which is different for me,” Siembieda continues. “And the only way to know that is to know your gene expression. That’s where I think that sequencing in the large scale is impactful.”

This nutritional genomics effort is exemplified in a recently published study by NIHS’s Armand Valsesia and colleagues, including Descombes, who performed transcriptome profiling in obese, non-diabetic subjects receiving low-calorie diets (LCD) to see if they could identify markers for weight loss and glycemic control.

“Building on our previous research and our in-house technological expertise in characterizing and quantifying the pool of relevant biological molecules, we studied the link between gene expression changes during LCD and how they relate to long-term clinical changes, with the aim of better understanding why individuals respond differently, and predict the success of dietary interventions more accurately,” Valsesia explained in a press release.

The researchers performed baseline RNA sequencing of adipose tissue biopsies from the subjects, who they then placed on an eight-week LCD, followed by a six-month weight-maintenance diet. At both week eight and month six, the researchers repeated the RNA sequencing of adipose tissue biopsies, validating their findings with quantitative RT-PCR.

Of the 1,173 genes that were differentially expressed, 29 were significantly associated with changes in both body mass index and glycemic control. Perhaps unsurprisingly, most of those genes were associated with molecular pathways linked to lipid and glucose metabolism.

“Ours is the first transcriptome-wide study involving nearly 200 subjects, making it by far the largest ever carried out in this field. It shows which genes involved in lipid metabolism are altered as a result of dietary intervention, allowing us to predict their physiological outcomes with much greater accuracy and identify those genes whose effects can be specifically modulated by diet,” Valsesia commented. “This represents an additional step toward the development of new and adapted nutritional solutions to help non-responders improve their metabolic health.”

Liquid biopsy is also becoming more prevalent as clinicians look to circulating cells and free nucleic acids in various body fluids for signs of disease.

At the American Society of Clinical Oncology annual meeting in June, researchers from Memorial Sloan Kettering Cancer Center and GRAIL presented findings of their efforts to determine how well high-intensity next-generation sequencing (NGS) could identify cancer-related mutations from the minute amounts of tumor DNA circulating in blood plasma (ctDNA) of patients with advanced breast, non-small cell lung and prostate cancers.

The researchers isolated ctDNA and sequenced two million base pairs across the genome an average of 60,000 times, and compared those results to similar analyses of tumor tissue from the same patients.

For 89 percent of the 124 patients with both sources of sequence data, at least one mutation identified in tumor tissue was also detected in ctDNA, and for all mutations studied, 73 percent of mutations found in tissue were also detected in blood. But just as importantly, several mutations were exclusively identified in blood, mutations that would have been missed using tissue biopsy alone.

“These encouraging results showed our high-intensity sequencing approach is able to detect a broad range of tumor mutations in the bloodstream with high levels of concordance with all mutations detected in tumor tissue,” said Alex Aravanis, head GRAIL research and development, in the announcement (see also the sidebar “Routine NGS diagnosis?” below after the end of this main article).

“These important foundational data support the feasibility of our approach and will inform further development of blood tests to detect early cancer. We have now started evaluating our high-intensity sequencing approaches in people with and without cancer in our large-scale Circulating Cell-Free Genome Atlas (CCGA) study.”

Another area of increasing specialization is in the sequencing of microbial pathogens for the diagnosis of infectious disease and possible drug resistance (see the sidebar “Identifying infections” below after the end of this main article), a problem highlighted recently in “The fungus within us” (A special report that appeared in the September 2017 issue of DDNews).

Getting it right

As NGS expands to more varied applications and starting materials, and both researchers and clinicians push the sensitivity limits to, for example, identify signs of disease onset earlier and earlier, the validity and importance of minute genetic changes will become increasingly important. Thus, any source of error within a given workflow needs to be identified and eliminated.

The challenge is critical, according to Christophe Roos, co-founder and chief scientific officer of NGS-specialist Euformatics.

“I think that the greatest challenge in genomics—in addition to the ever-present complexity of biological systems—is overcoming the lack of understanding of how confidence in the final results has to be established on the basis of correct procedures all the way from the biosample collection to the bioinformatics analysis of the sequencing data,” he offered in a 2016 interview.

Eva-Maria Surmann, product manager for Horizon Discovery, highlights the challenge.

“Errors can be introduced at any stage of the workflow; for example, during sample extraction, the sequencing reaction itself, up to data processing, analysis and interpretation,” she suggests.

“So, for example, three labs running the same NGS platform on the same samples but with a different bioinformatics workflow can provide really different results, contradictory results because of the complexity of the downstream analysis,” she continues. “There is variability, for sure, but I think most end users are not fully aware of the variability.”

Surmann suggests that the need to push sensitivity has really driven innovation and evolution of NGS workflows, as labs are challenged to get the same results from lower quantities of DNA.

“The emphasis has really become on sample processing to make sure that as much material—as much DNA—can be recovered from a patient sample,” she adds.

“Basically, we are pushing the system toward using sequencing for low-frequency mutations—somatic mutations that are very rare—and because of that, we have to now tackle issues that we did not really pay too much attention to before,” echoes NEB’s Ettwiller. According to her colleague Tom Evans, head of NEB’s DNA enzymes division, the already complicated task of capturing 100 percent of the DNA in a sample becomes mission-critical when dealing with low-input or single-cell analysis.

But even if you can capture all of the nucleic acid of interest, the quality of that starting material can be significantly impacted by the workflows in which it was isolated, as shown in a recent publication by Ettwiller and Evans.

Evans recounts that as the team was examining DNA preparation workflows, Ettwiller began to notice signature damage profiles for DNA prepared from formalin-fixed paraffin-embedded tissues.

“At one point, [co-author] Lixen Chen started preparing the DNA and giving it to Laurence without telling her how it was prepared or where it came from,” he recalls. “Laurence was so good at identifying those damages, she could tell us the workflow.”

It was at that point, he says, they realized the damage was a pretty big deal. But was it limited to NEB and their isolation methods?

“We went to The Cancer Genome Atlas [TCGA], which is a database containing cancer samples, where somatic mutations are extremely important to be able to detect,” Ettwiller continues the story. “We ran the same analysis and realized not only is it there, but in some cases, there is even more damage.”

Using something called the Global Imbalance Value (GIV) score as a metric of how badly damaged the DNA is—undamaged DNA has a GIV score of one, anything >1 is an indicator of damage severity—they discovered that 41 percent of the 1000 Genomes Project data set had GIV_{G_T} scores ≥1.5. Meanwhile, 73 percent of the TCGA sequencing runs had GIV_{G_T} scores >2. This suggested that the majority of G-to-T transversions in the data were erroneous and that damage was a pervasive problem.

Ettwiller is quick to caution, however, that GIV score is a broad indicator of data quality.

“We can’t tell that this particular marker at this position is due to damage, while another one is not due to damage,” she explains. “We just know that certain samples have a certain level of damage that will affect variant calling.”

If you use enough samples and the same genes are always affected, she adds, the likelihood that the variant is due to damage goes down.

People have started to introduce GIV scores into their experimental design, Ettwiller continues, effectively as a quality control (QC) indictor of the workflow. Researchers can then determine how far they want to press with their experiment based on its GIV score, or if they want to shelve that particular sample as too damaged.

To ensure they can detect signals from samples that may contain only trace amounts of the DNA of interest, researchers typically apply whole-genome amplification (WGA). Unfortunately, such methods can introduce errors into the system that can lead to problems such as false negatives and positives.

This challenge was highlighted in a recent publication by NEB’s Jennifer Ong and Vladimir Potapov, who used PacBio’s single-molecule real-time (SMRT) sequencing to identify the types and rates of errors generated by various DNA polymerases in polymerase chain reaction (PCR) reactions.

Not only did the researchers identify examples of nucleotide misincorporation, a signal of polymerase fidelity, but they also noted for some polymerases significant levels of template switching and PCR-mediated recombination. As well, for high-fidelity polymerases, it appeared that DNA damage during thermal cycling predominated over base substitution errors.

With PCR-related issues in mind, Raffaele Palmirotta and colleagues at University of Bari ‘Aldo Moro’ recently presented their efforts to identify mutations in single circulating tumor cells (CTCs) in the absence of WGA.

In a proof-of-principle study, the researchers spiked peripheral whole blood from healthy donors with melanoma cells. They then enriched for CTCs using a cell separator and isolated single and pooled cells for analysis by NGS, with or without PCR amplification.

The researchers found that elements of the WGA procedure precluded them from identifying a subset of variants determined by NGS of the original melanoma line, whereas they were able to identify all 10 variants in the WGA-untreated CTCs, whether as a single cell or in pools of two, four or eight cells.

“In our intent to determine an ideal minimal number of CTCs suitable to be analyzed without WGA, we could not identify a potential numerical threshold,” the authors noted. “The limiting factor for the appropriateness of the technique seems to be the quality of the DNA sample itself, rather than the number of initial DNA copies, since the number of cells did not influence the library construction and sequencing.”

The authors were quick to note that further studies will need to be performed to test for other technical and analytical parameters. Streamlining the process by removing potentially error-prone steps, however, could offer significant savings in routine use.

NEB’s Evans likewise points to challenges around the expanding application of single-molecule and long-read sequencing, where damage can be a problem on multiple levels.

“It is useless to be able to read a 100-kb sequence but the DNA is so damaged that you can only read 1,000 bases before the DNA falls off the sequencing apparatus,” he says.

Such efforts are of particular interest to AATI, which has specialized in platforms for the isolation and characterization of large nucleic acid fragments.

In the expectation that long-fragment sequencing was going to become more prevalent, the company developed their Fragment Analyzer to facilitate large fragment smear analysis, including quality control.

“We’re really the only company that can analyze large fragment smears effectively, and we did that for PacBio,” explains AATI’s Siembieda. “And then on the heels of that, we decided to build a brand-new instrument that uses pulse-field capillary electrophoresis for separating large DNAs.”

Although pulse-field electrophoresis technology has been around since the mid-80s, he says, it was typically performed in agarose gels. AATI’s FEMTO Pulse instead relies on capillary electrophoresis.

“The reason why that is a real benefit for the large-fragment people—and it is applicable to Illumina—is that the amount of material they need to run in agarose pulse-field gels is in the hundreds of nanograms, where we can take picogram quantities of DNA,” Siembieda explains.

This will be particularly important not just in large-fragment analysis, he continues, but also as the sequencing communities move to smaller input units such as single cells or liquid biopsies, where samples can be precious and sensitive to perturbation.

“And the second thing that we can do is we can reduce the time by 20-fold for that analysis,” he continues. “Typically, pulse-field agarose electrophoresis takes an overnight separation; about 16 hours, let’s say. Our separation can be done as quickly as about one hour.”

Growing needs

Savings in sample materials as well as analysis time are likely to become increasingly important as NGS increases throughput for population studies, such as the 1000 Genomes Project.

“There are many countries around the world that are sequencing hundreds of thousands of their own people,” Siembieda offers. “They’re going to sequence 150,000 people in the UK. I believe Ireland has a similar program. I know there’s an Asia sequencing project.”

“You have to start thinking about the diversity of the human population and how those individual groups are different,” he says. “Think about sequencing every baby that is born, just to have that baseline of what their somatic cell line should look like and what is the sequence.”

One such project is the Human Cell Atlas (HCA), an international consortium focused on the characterization of all cell types in the human body.

“Recent advances in single-cell technology have allowed us to look at cells with a clarity and depth of analysis that we have never been able to achieve before, making this ambitious project a reality within reach,” said Aviv Regev, HCA co-chair, in a recent press release.

“The Human Cell Atlas will impact almost every aspect of biology and medicine, ultimately leading to a richer understanding of life’s most fundamental units and principles,” added Sarah Teichmann, the other HCA co-chair. “The project has implications for a vast range of scientific applications and disease areas, and will benefit research and discovery around the globe.”

In October, 10x Genomics announced a partnership with HCA that will see consortium members receive discounted access to the NGS specialist’s Chromium RNA analysis platform.

“We are excited to participate in this important project, which will have an impact on our understanding of basic human biology and disease,” offered 10x Genomics CEO and co-founder Serge Saxonov, drawing parallels between HCA and the Human Genome Project. “Our innovative technology enables massively parallel scRNA-seq analysis of hundreds to millions of individual cells, which is a revolutionary change in how gene expression experiments can and should be performed.”

For its part, Horizon Discovery is hoping to redefine the way we look at assay verification and validation, as well as harmonize NGS workflows through application of its reference standards, including the recently released OncoSpan with 386 variants across 152 key cancer genes.

“I think reference standards are really critical, not only to set up the assay but also to test, validate, verify and also perform routine monitoring,” explains Horizon’s Surmann. “It is not only the sequencing platform itself, it is the whole process from sample extraction to sequencing and the bioinformatics analysis, as well, and the output.”

Critical to this purpose, she continues, is the fact that all of the company’s reference standards are cell-line derived rather than synthetic constructs. This means that each of the variants is in its natural genomic location, which helps ensure that the standards perform similar to those same variants in patient samples.

SeraCare Life Sciences similarly touts the importance of reference standards that closely mimic real-world samples.

In July, the company launched its Seraseq Circulating Tumor DNA v2 reference materials, which they suggest performs like ctDNA-based liquid biopsies without the DNA damage associated with ultrasonicated cells.

“One of the biggest challenges in developing highly sensitive ctDNA assays is the lack of reference materials that perform like real-world samples and contain all of the relevant variants,” said Jason Myers, CEO at ArcherDX, in the announcement. “Seraseq ctDNA reference materials resemble native cfDNA, from pre-analytic assessment of DNA quality through sequencing. We expect these new references will greatly facilitate our assay development, and they also potentially will help our customers execute an effective QC strategy.”

According to Roos, most labs continue to focus their attention on genomic variant analysis as it ultimately relates to patients, quality issues remaining a secondary consideration. But the QC learning curve seems to be flattening.

“Laboratories are now learning how instrumental standard quality metrics are in the validation of their workflows and the provision of regularly high-quality genetic interpretations, ensuring better diagnostics, hence patient safety,” he suggested in an interview.

This changing perspective couldn’t be coming at a more opportune time as prognosticators envision a future where sequencing ceases to be the exclusive domain of researchers and clinicians.

NEB’s Ettwiller, for one, imagines a day when sequencing will influence the everyday lives of the general population.

“Regular people will not know that they are sequencing but they want to know if the candy that fell on the ground is still edible,” she suggests. “They’d like to know if they can safely touch something that everyone touches.”

“These are silly things, but people are going to be asking all sorts of questions, and we’ll have to have the technology ready for that,” she continues. “This is really looking further in the future, but that’s where we are leading to: the normalization of sequencing for addressing all sorts of different questions.”

Not just for scientists or clinicians, Ettwiller vows, but for everyone.

Routine NGS diagnosis?

Although there are an increasing number of studies of liquid biopsy analysis using NGS, the translation of this effort from research samples to clinic and ultimately market remains complicated.

Earlier this year, Rick Klausner and colleagues at GRAIL published their thoughts on some of the challenges to making the application of NGS to circulating tumor DNA (ctDNA) routine for early cancer detection.

“The implementation of such a test would be technically challenging, since many genes would have to be simultaneously queried for alterations in order to cover enough of the known diversity in cancer genomes to see most tumors,” the authors wrote. “While next-generation DNA sequencing technology does enable high degrees of target multiplexing, the depth of sequencing would also have to be very high to sample enough ctDNA molecules to reliably measure them in a background of mostly non-tumor-derived cfDNA. We estimate that such a broad and deep sequencing approach could require orders of magnitude more sequence data than liquid biopsy assays currently use.”

They also highlighted the challenge that any clinical test would require not only sensitivity for early-stage disease, but also high specificity. This would require not only the analysis of large numbers of healthy subjects to ensure that identified variants are in fact cancer-defining, but also that these subjects would need to be followed longitudinally for later cancer diagnoses to help clinicians distinguish false from true positives.

And ultimately, researchers would need to undertake prospective clinical trials to demonstrate that such tests are clinically useful versus standard of care. This is where the scale of such an undertaking becomes ominous.

“Because the yearly incidence rate for cancer is low (1 percent to 2 percent in aggregate across tumor types), and because differences in cancer-specific mortality can take years to manifest, it is anticipated that such a study would require hundreds of thousands of participants to be appropriately powered,” the authors stress.

For its part, GRAIL has initiated a study called the Circulating Cell-Free Genome Atlas (CCGA) that will apply NGS to plasma samples from 10,000 or more subjects with newly diagnosed cancer or no cancer diagnosis, following these subjects for up to five years. The goal is to identify models based on cfDNA to accurately classify people with and without cancer.

The study is expected to conclude in 2022.

Identifying infections

A perfectly healthy 60-year-old man suddenly complains to his wife of lower back pain, chills and vomiting, which was worrisome enough. But when her husband becomes completely disoriented and his skin starts changing color, she rushes him to the hospital.

Blood tests, physical exams and his rapid deterioration suggest severe infection, but with what?

Advanced Analytical Technologies’ Steve Siembieda, vice president of commercialization, suggests that microbial identification and antibiotic resistance monitoring will be a growing area for next-generation sequencing (NGS) because of its speed relative to traditional microbiological analysis.

“Within laboratories, people are dying waiting to find out which antibiotic they should be given, and as you know, clinicians are and should be more cautious with the antibiotics to be given out,” he says. “The faster we can get data on all of the antibiotics to which an organism is resistant, the better the treatment can be.”

One company that is pushing in this direction is Karius, which recently presented its efforts to rapidly identify a variety of clinical pathogens using NGS technologies.

As described at ASM Microbe 2017 in June, the company and its collaborators conducted a proof-of-principle study where they extracted cfDNA from the plasma of patients with confirmed invasive fungal infections. Following NGS, the human cfDNA sequences were excluded and the remaining sequences were aligned against a pathogen-reference database of nearly 1,000 microorganisms.

In two-thirds of the cases, NGS successfully identified the fungal organism diagnosed from biopsies, which included infections from Aspergillus, Rhizopus and others.

“Infectious diseases are a leading cause of death around the world, and our current methods of testing can only detect a narrow range of pathogens and may require invasive biopsies,” offered Peter Chin-Hong, infectious disease specialist at the University of California, San Francisco, and author of a related Karius study of infection in stem-cell transplant patients. “This ability to identify pathogens broadly and quickly, and monitor infection in high-risk patients, holds the potential to allow doctors to develop precise and effective treatment plans for patients.”

The aforementioned 60-year-old man was the subject of similar analysis by Karius’ Bryan Kraft and colleagues at Duke University Medical Center. As the hospital was uncertain as to what pathogen was causing the man’s sepsis, a plasma sample was sent to Karius for cfDNA extraction and NGS.

Within 24 hours, sequencing showed the man to be infected with Capnocytophage canimorsus, an infection he picked up from his family dog. As well, sequence-based drug-resistance analysis offered insights on how the patient’s antibiotic treatment could be narrowed from broad-spectrum therapy to monotherapy.

Unfortunately, subsequent infections with Candida and Clostridium further complicated the patient’s condition and he passed away. Postmortem, 16S ribosomal RNA sequencing of a blood culture isolate confirmed the NGS results.

Despite the patient’s outcomes, Kraft and colleagues saw hope in the case study for future development and validation of the NGS approach, suggesting it offered “distinct advantages over existing pathogen identification techniques.” From their perspectives, these included:

It combines NGS, molecular biology techniques and informatics to filter human sequences and identify pathogen sequences directly from patient plasma;
It is unbiased and detects virtually any microorganism;
It is high throughput and returns results within a clinically actionable timeframe;
It is culture-independent and allows identification of fastidious organisms; and
It can potentially screen for known antibiotic resistance genes.

Facing the same challenges are Curetis and MGI, which in September announced a collaboration to similarly develop NGS-based in-vitro diagnostic assays for microbial infections.

Under the terms of the agreement, MGI will leverage its expertise in hardware and chemistry integration to develop automated workflows and manufacture the NGS assays. Meanwhile, Curetis and its subsidiary Ares Genetics will provide its expertise in areas such as sample prep, screening panel design and assay design, leveraging its Genetic Antibiotic Resistance and Susceptibility (GEAR) database.

“NGS offers the unique possibility to dissect increasingly complex resistance patterns in microbial pathogens in a single test,” explained Saarland University’s Andreas Keller, who also helped develop GEAR. “However, this requires smart data interpretation and clinical decision support. BGI Group sequencing technology combined with the GEAR database allows the translation of NGS technology into meaningful diagnostic applications for complex microbial infections.”

With the constant and rapid evolution of microbial species and the expansion of strains showing multi-drug resistance, the need for rapid-response diagnostic assays such as the one described here will likely grow significantly in the near future (see also “The fungus within us” in the September 2017 issue of DDNews).