EVENTS | VIEW CALENDAR
‘Parallelized’ next-gen sequencing database
SANTA FE, N.M.—The National Center for Genome Resources (NCGR) has chosen Kognitio's WX2 database to handle the analytical processing of multiple terabytes of research data generated in the center's studies of human, animal and plant diseases.
Kognitio is a technology company that provides leading-edge solutions to business problems that require the acquisition, rationalization and analysis of large or complex data. The company claims that its WX2 is the industry's fastest and most scalable analytical database on the market. With headquarters in the U.K., Kognitio has U.S. offices in Chicago and New York.
The very concept of next-generation sequencing owes its practical development to "parallelized" processing, says Dr. Ernest Retzel, program leader at NCGR. Until about two years ago, there had been 30 years or so of slow, steady growth in output. Then came the "game changer," as Retzel describes it. Data processing speed became "mind boggling" and cost almost trivial. "In two weeks," he notes, "six instruments could process as much data as was generated in mapping the human genome."
Kognitio's CEO for North American operations, John K. Thompson, describes the WX2 as a "proprietary, massively parallel database" that allows organizations to take many, many cheap computers and link them into one network that is HIPPA complaint and meets all other regulatory requirements. WX2 is a software-only solution that is currently running on six machines at NCGR, scheduled to move up to eight in the near future.
"We realized that we needed a new database solution to increase the flexibility and the performance for our future plans," Retzel comments.
"After looking at various options, we were impressed by Kognitio's database technology, their understanding of the demands of organizations in the genomic space as well as their willingness to make our installation and deployment a success. We now have a solid partnership with Kognitio, and WX2 fits nicely into NCGR's growth plans that involve multiple new projects that will call for data volumes measured in tens to hundreds of terabytes."
In the past, data loads and indexing were the bottlenecks, Retzel says. With the WX2, loading is just a matter of hours and no indexing is required, he adds. Kognitio claims that queries performed against the WX2 database return in one-eighth of the time required by other platforms.
"This capability can let us look across the landscape of genetic diseases," Retzel says. "Sickle cell anemia is a one nucleotide, one gene disease, but most others are much more complex. Schizophrenia and mesothelioma are examples where many genes are involved. We would like to look at all of them for subsets of genes that might be involved. On the treatment side, we know that a specific drug may work on only 15 percent of lung cancer patients, so we would like to fit treatment to the appropriate population based on the set of markers found in a genome scan."
Another aspect of interest is gene discovery.
"We've found in case after case," he says, "that there are genes that have been undetected because the sampling wasn't deep enough. We're now looking at regions of genomes that have been poked at by post-docs for tens of years and should have been well characterized." "But aren't" is clearly understood.
Using another "for instance,' Retzel describes the case where a minuscule piece of rare tissue yielded 10 thousand cells and more than a thousand new genes in only three days of sequencing and four-to-five days of analysis, and "not many false leads at all."
In another study, NCGR found that a respiratory bug common in animals had infected 26 tissues in the body, both male and female. Deep sequencing was started in lymph nodes, testes, muscles, etc. and the infectious agent was found in all of them, with a different response to the infection in each tissue type. Different genes were turned on at different levels depending on the tissue. Some genes, Retzel notes, had never been turned on before.
"We no longer need cell cultures," he says, and adds that somewhere between 90 percent and 99.9 percent of all organisms are not culturable. Now it is possible to take a sample from the gut, seawater, or a mountainside and sequence all the metagenomic material found there. Coming up? Third-generation sequencing may be "for real," Retzel says, in as little as a year.