EVENTS | VIEW CALENDAR
Next-gen data dynamic duo
BOSTON—The Bio-IT World conference served as the setting April 9 for Danish bioinformatics software developer CLC bio and U.S.-based computing giant IBM to announce a combined turnkey next-generation sequencing data analytics solution.
The Boston venue was appropriate, not simply because CLC bio's Americas headquarters is based in neighboring Cambridge, Mass., but because, in a sense, the Bio-IT World conference is why the two companies got together several years ago.
"IBM had not participated in Bio-IT World for a number of years. During our first year of return to Bio-IT, three years ago, we surveyed the booths in an effort to learn more about customers' interests," recalls Janis Landry-Lane, director of World-wide Technical Computing at IBM. "We followed the crowds and came upon the CLC bio booth. We were intrigued by their demonstrations and learned more from discussions with them and their website. We decided that CLC bio was a company that we really needed to be engaged with. We were starting fresh at Bio-IT World, we wanted to do something interesting and valuable, and so we started our partnership discussions."
"IBM had identified that life sciences was an area they wanted to invest in, and a lot of big data was already coming out of next-gen sequencing technologies then," adds Lasse Görlitz, vice president of communications for CLC bio. "Back then, being at Bio-IT World was more an exploratory thing for them, and one of the things they did other than exhibiting themselves and having discussions with visitors was to go around and visit the other exhibitors. They were intrigued by the number of people we were attracting to our booth, and wanted to find out why."
At Bio-IT World 2012, the two companies presented the optimized performance of CLC Assembly Cell for genomics sequence analysis, which leveraged IBM's high- performance file system and cluster, and they were able to do referenced-based mapping at 37x coverage in 13.5 minutes.
"As we worked together, the synergy was evident; CLC bio has some of the best-of-breed genomics software, IBM has deep systems optimization skills. With our combined efforts, we produced remarkable results," Landry-Lane says. "In 2013, our joint effort led to a turnkey solution for bioinformatics. We built the CLC Genomics Sequencing Analytics Solution with our optimized IBM hardware sized appropriately for small, medium and large workloads and delivered to customers with CLC bio's latest version of Genomics Workbench and Server."
The combined platform announced this year at Bio-IT World 2013 is a scalable end-to-end solution that integrates a computing cluster built on advanced IBM hardware, CLC Genomics Server software for large-scale genomics sequencing data analysis and CLC Genomics Workbench client software for analyzing, comparing and visualizing high-throughput sequencing data, the two companies noted in the news release about the collaborative effort.
"One of the really nice things about this collaboration is that they 've had many years of experience with elaborate IT setups at big and complex institutions," Görlitz says of IBM. "Meanwhile, we're really good at making bioinformatics software, but we don't have that big data experience. Both partners bring something to the table that the other doesn't have and that the customers want."
Market forces play a huge role in driving the need for building a solution like the combined IBM-CLC bio offering, Landry-Lane explains. First of all, many new institutions are now engaged in next-generation sequencing because of its promise to deliver better healthcare. The costs of sequencing have come down dramatically and the time to sequence has been reduced to a day or less, she notes, and the flood of data generated by the sequencer must be processed in a timely fashion so that the maximum utilization of sequencers can be realized. This puts a demand on the IT solution to support this environment.
A second strain on the system is storing all of the data generated and analyzed, she adds.
"Scientists want to keep files for future reference, and these are very costly to keep online," Landy-Lane notes. "With our integration of tape storage into the file system and the use of our information lifecycle management that is policy-driven with hierarchical storage and data access, we can seamlessly tier stored data on both disk and tape, and researchers can store and retrieve files from the system regardless of the storage medium."
But the work the companies have pursued in the past and continuing into the present is more than simply showing off their technical chops, and is now a very important business arrangement, Landry-Lane says.
"Aside from the technical aspect, the legal agreement between our organizations was very important," she explains. "We have formalized world-wide agreements regarding joint marketing and initiatives. The CLC bio teams have been great collaborators. They are responsive and enthusiastic about what we're doing. It is all about synergy; there is no overlap or redundancy in our work together. They've been a wonderful independent software vendor to work with—we couldn't do this alone. None of this would have happened without their technology to drive this."
"By combining our world-leading bioinformatics software with IBM's excellent hardware and many years of expertise in setting up and supporting elaborate IT systems, we're delivering a powerful turnkey analysis platform, which will enable institutions and scientists to handle the demands of high-throughput sequencing data analysis," said Mikael Flensborg, director of global partner relations at CLC bio, in an official statement.
According to the two companies, the cluster compute nodes are IBM System x 3550 M4 rack servers powered by Intel Xeon E5-2650 processors. The nodes are connected to an IBM Storwize V7000 Unified network attached storage system, which consolidates block and file workloads. Storwize V7000 Unified systems support file data storage using the IBM General Parallel File System (GPFS). With GPFS, CLC bio software is leveraging a shared-disk file management solution designed to provide fast, reliable access to next-gen sequencing data for optimizing performance. The turnkey analysis platform comes in three different configurations, ranging from 48 CPU cores and 192 GBs of memory to 192 CPU cores and 768 GBs of memory, depending on the analysis requirements of the individual customer.