Metagenomic data analysis
Metagenomic studies have a wide range of applications, from environmental studies to human health. In 2004 two research groups sequenced the bacteria present in two different environments. The first study examined the bacterial biofilms that cause acid mine drainage. The scientists from the Department of Energy Joint Genome Institute were able to assemble entire genomes from among the bacteria present in this extreme environment. The second study explored the bacterial diversity found in the Sargasso Sea, an area of the Atlantic Ocean widely believed to be sparsely populated due to a lack of nutrients. Not only were the scientists from the Venter Institute and The Institute for Genomic Research able to identify a wide range of bacteria in this environment, their study identified a large number of novel genes, almost doubling the number of genes present in public databases. Such environmental studies are important not only to cleanup efforts, but also allow us to discover novel compounds that can be used in medicine or industry.
A variety of projects are under way to characterize the populations of bacteria present in the human body. These bacteria are an integral part of our lives, assisting our digestion, providing us with necessary vitamins, and protecting us from harmful bacteria. Changes in the delicate balance of the bacterial environments within our bodies can lead to a variety of diseases. Crohn's disease has been shown to correlate with changes in the population of intestinal bacteria, while dental health is directly related to the types of bacteria present in our mouths (cavities are caused by harmful bacteria whose growth is encouraged by the consumption of sugar). It is therefore very important to understand the complex interactions between these bacteria, as well as the interactions between bacterial populations and our bodies.
Metagenomic studies provide us with a wide range of scientific challenges. The assembly programs available today are not well suited to assembling environmental data. New algorithms have to be developed that take into account the specific characteristics of metagenomic data: multiple genomes of varied levels of coverage and degree of relatedness. The methods previously used to analyze bacterial populations have little application to metagenomics. In the past, scientists were concentrating their attention on a specific feature of all bacterial genomes - the ribosomal RNA operon - gene that can be used as a "bar code" to identify specific organisms within an environment. Metagenomic data, however, contain information about many genes within bacteria, requiring us to develop new methods for recognizing organisms.
Our group is exploring computational solutions to the various metagenomics problems, such as:
- Efficient clustering of marker gene survey data (e.g., 16S)
- Statistical approaches for comparing metagenomic samples
- Metagenomic assembly algorithms able to handle and detect genomic variation
- Dynamic predictive models of microbial communities
Students and Postdoctoral researchers:
This is an NSF project. See more here