The schematics of life: Human interactome study raises red flags

BALTIMORE—Recently, in comparing protein-protein interactions in humans with those in other organisms, researchers at Johns Hopkins University (JHU), Germany's University Würzburg, and the Institute of Bioinformatics (IOB) in Bangalore, India raised significant questions about the current assumptions scientists are making in their efforts to develop drugs.

In a recent Nature Genetics, Dr. Akhilesh Pandey, JHU professor and IOB co-founder, and colleagues studied interactomes from human, yeast, worm and fly datasets to look for overlaps and missing data as clues to metabolic pathways. In the process, they identified interactions among more than 1000 genes involved in more than 3000 diseases and determined that proteins involved in inherited diseases were often linked to proteins associated with similar diseases, signaling common metabolic pathways.

The researchers were surprised, however, at the relative lack of overlap between the four species, which scientists have relied upon in the belief that disease or drug impacts in mice or yeast will have parallels in humans. Pandey puts this problem down to inefficiencies in the most popular interaction analysis methods, including yeast two-hybrid screens, and a sign that these databases are far from saturated. Dr. Yuri Nikolsky, CEO of GeneGo, a company that offer data resources similar to Pandey's Human Protein Reference Database (HPRD), agrees with the concern.

"At the last Keystone meeting, I heard that nine out of ten yeast two-hybrid interactions between human proteins are probably wrong," he says. "This is why manual curation from small-scale experiments with direction and mechanism is so important. When the critical mass of such high-confidence 'benchmark' interactions is assembled, the high-confidence networks can be used as a backbone for mapping molecular data such as microarrays expression and high-throughput interactions data."

The limited data may also impact another possible misconception of many researchers who believe that the most important proteins are those involved in the largest number of interactions. Pandey and his colleagues, however, determined that elimination of any one of several proteins with very limited partners had severely detrimental effects on health, suggesting that many companies have probably passed on valuable drug targets or focused on the wrong targets.

The current study comes at a time when an increasing number of companies are integrating interactome and genome databases for drug discovery. In the past two years alone, there has been a dramatic increase in the number of licensing agreements between drug companies and firms like GeneGo.

Pandey raises a caution flag, however, about the indiscriminant use of interaction databases because not all data is created equal. If the chosen repository is comprised of validated data, he says, the system is invaluable to research. But this is not always the case. "I see claims of databases that contain millions of interactions," he says. "That is ridiculous, and companies are not just wasting their money on the database but on their discovery and validation as they are working on incorrect targets.

"There are simply not millions of interactions reported [in the literature], and what is being done by fly-by-night operators is collection of co-occurring words in abstracts of published articles using automated computer methods as proof for interactions," he adds. "This is simply not true."

According to Nikolsky, many interactions databases feature low-quality high-throughput interactions data, derived from using automated text mining algorithms. "Overall, using text mining algorithms brings more troubles than advantages, and manual cleaning of such 'automatically generated' data is very time consuming," he says. "At GeneGo, we do not use any automated text mining tools for annotations."

Pandey also chastises the scientific community for its lax attitude regarding these repositories. "An average biologist is not prepared to share the responsibility of lending their expertise to correct errors or to make the entry corresponding to their area of expertise better," he says. "So, we like databases, depend on them, but are not willing to contribute."

Whether users will ever decide to contribute to such a centralized effort remains to be seen.