Metagenomic data analysis

Metagenomics is a new field of research in which scientists analyze the genomes of organisms recovered directly from the environment. Most naturally occuring bacteria cannot be cultured and therefore cannot be analyzed by traditional means. Metagenomic studies provide us with a mechanism for analyzing previously unknown organisms. At the same time we can examine the diversity of organisms present in specific environments as well as analyze the complex interactions between members of a specific environment. While most metagenomic studies to date have concentrated on bacterial populations, it is important to note that viral and fungal populations are also of significant scientific interest.

Metagenomic studies have a wide range of applications, from environmental studies to human health. In 2004 two research groups sequenced the bacteria present in two different environments. The first study examined the bacterial biofilms that cause acid mine drainage. The scientists from the Department of Energy Joint Genome Institute were able to assemble entire genomes from among the bacteria present in this extreme environment. The second study explored the bacterial diversity found in the Sargasso Sea, an area of the Atlantic Ocean widely believed to be sparsely populated due to a lack of nutrients. Not only were the scientists from the Venter Institute and The Institute for Genomic Research able to identify a wide range of bacteria in this environment, their study identified a large number of novel genes, almost doubling the number of genes present in public databases. Such environmental studies are important not only to cleanup efforts, but also allow us to discover novel compounds that can be used in medicine or industry.

A variety of projects are under way to characterize the populations of bacteria present in the human body. These bacteria are an integral part of our lives, assisting our digestion, providing us with necessary vitamins, and protecting us from harmful bacteria. Changes in the delicate balance of the bacterial environments within our bodies can lead to a variety of diseases. Crohn's disease has been shown to correlate with changes in the population of intestinal bacteria, while dental health is directly related to the types of bacteria present in our mouths (cavities are caused by harmful bacteria whose growth is encouraged by the consumption of sugar). It is therefore very important to understand the complex interactions between these bacteria, as well as the interactions between bacterial populations and our bodies.

Metagenomic studies provide us with a wide range of scientific challenges. The assembly programs available today are not well suited to assembling environmental data. New algorithms have to be developed that take into account the specific characteristics of metagenomic data: multiple genomes of varied levels of coverage and degree of relatedness. The methods previously used to analyze bacterial populations have little application to metagenomics. In the past, scientists were concentrating their attention on a specific feature of all bacterial genomes - the ribosomal RNA operon - gene that can be used as a "bar code" to identify specific organisms within an environment. Metagenomic data, however, contain information about many genes within bacteria, requiring us to develop new methods for recognizing organisms.

Our group is exploring computational solutions to the various metagenomics problems, such as:

We are also closely collaborating with biologists in order to apply our algorithms to problems of immediate relevance in human health and environmental studies. Several studies we have been involved in include:

Principal Investigators


This is an NSF project. See more here