March 20, 2013
What Does The Exponential Growth Of Genomics Data Actually Mean?
|DNA sequence data is growing exponentially. For most of the genes that we identify, we have no idea of their biological functions, says Miami computational biologist Iddo Friedberg. Only by a group effort can the field move forward and learn to harness the deluge of genomic data, turning it into useful information.|
According to Iddo Friedberg, a computational biologist at Miami University, “for most of the genes that we identify, we have no idea of their biological functions. They are like words in a foreign language, waiting to be deciphered.”
Friedberg works in the new bioinformatics research field of computational prediction of protein and gene function. He and colleagues Predrag Radivojac, associate professor of computer science and informatics, Indiana University, Bloomington, and Sean Mooney, associate professor, Buck Institute for Research on Aging, are the leaders of CAFA (Critical Assessment of Function Annotation).
CAFA is a new community-wide experiment to assess the performance of the multitude of methodologies developed by research groups worldwide to “help channel the flood of data from genome research to deduce the function of proteins,” Friedberg explained.
Thirty research groups participated in the first CAFA, presenting a total of 54 methods. The results are published in an article in Nature Methods co-authored by all the participating groups, with Friedberg and Radivojac as lead authors.
The research was published this year in Nature Methods.
Concurrently, 15 articles edited by Friedberg and Radivojac were published in BMC Bioinformatics. These articles are companions to the Nature Methods article, and describe the top-ranking methods in-depth.
The purpose of CAFA is to establish “Accurate annotation of protein function is key to understanding life at the molecular level and has great biochemical and pharmaceutical implications.”
The accurate annotation of protein function is key to understanding life at the molecular level and has great biochemical and pharmaceutical implications, explain the study authors; however, with its inherent difficulty and expense, experimental characterization of function cannot scale up to accommodate the vast amount of sequence data already available.
Friedberg and Radivojac explain:
The computational annotation of protein function has therefore emerged as a problem at the forefront of computational and molecular biology.
Recently, the availability of genomic-level sequence information for thousands of species, coupled with massive high-throughput experimental data, has created new opportunities as well as challenges for function prediction.
Many methodologies have been developed by research groups worldwide, many based in comparing unsolved sequences with databases of proteins whose functions are known. Other methods aim at mining the scientific literature associated with some of these proteins, yet others combine sophisticated machine-learning algorithms with an understanding of biological processes to decipher what these proteins do, said Friedberg.
“Indeed, we may have already identified a protein that is an ideal drug target for cancer, but it is lost in the myriad of data labeled as ‘function unknown.’”
“Only by a group effort can we move the field forward and learn to harness the deluge of genomic data, turning it into useful information.”
Everyone recognized that this is an important endeavor, and that only by a group effort can we move the field forward and learn to harness the deluge of genomic data, turning it into useful information."
“We will continue running CAFA in the future."
“For the fist time we have broad insight into what works, where improvement is needed, and how we should move the field forward.
We will continue running CAFA in the future, as we are confident it will only help generate better methods to understand the information locked in our genomes, and those of other organisms," states Friedberg.
SOURCE Miami University
|By 33rd Square||Subscribe to 33rd Square|
Tags: bioinformatics , biotech , CAFA , computational biology , DNA sequencing , genetics , genomics , iddo friedberg , medicine , personalized medicine , Predrag Radivojac
33rd Square explores technological progress in AI, robotics, genomics, neuroscience, nanotechnology, art, design and the future as humanity encroaches on The Singularity.