All News

Colwell Concludes a Decade of Work with Gulf of Mexico Research Initiative

Jun 17, 2021

Rita Colwell, a Distinguished University Professor in the University of Maryland Institute for Advanced Computer Studies, recently concluded a decade of work with the Gulf of Mexico Research Initiative (GoMRI), an independent group of experts formed in 2010 following the Deepwater Horizon explosion and oil spill.

Colwell served as the research board chair for GoMRI, leading a group of almost two-dozen senior scientists and public policy experts who provided guidance on a large-scale effort to investigate the impacts of oil, dispersed oil, and oil dispersants on the ecosystems of the Gulf of Mexico and affected coastal states.

The research board oversaw an unprecedented investigation into the effects of the Deepwater disaster—a multidisciplinary undertaking that would ultimately involve more than 4,500 people, including scientists, lab techs, data and outreach specialists, students, and countless others.

The goal of the initiative, officials say, was to improve society’s ability to understand, respond to and mitigate the impacts of petroleum pollution and related stressors of the marine and coastal ecosystems, with an emphasis on conditions found in the Gulf of Mexico.

A detailed accounting on the history, scope—and successful outcomes—of this 10-year effort was recently published in a special edition of Oceanography magazine. Go here to view a PDF that highlights the efforts by Colwell and others associated with the GoMRI project.

Molloy Is Designing Efficient Algorithms for Reconstructing Evolutionary Trees

Jun 09, 2021

Perhaps the most iconic image in evolutionary biology is Charles Darwin's sketch of an evolutionary tree. The illustration highlights Darwin’s transformational idea that the evolutionary relationships among species can be depicted through a branching pattern, a concept known as the Tree of Life.

While Darwin primarily relied on the physical characteristics shared by subsets of species to determine an evolutionary tree’s structure, scientists today are using vast amounts of genomic data to reconstruct evolutionary trees—a field known as phylogenetics.

Erin Molloy, who joins the University of Maryland on July 1 as an assistant professor in the Department of Computer Science, is part of this new genomic revolution.

She is using powerful computational tools to unlock the full breadth of information available in genomic data, designing efficient algorithms to estimate evolutionary trees. This type of information is vital for determining the evolutionary history of birds, plants and even microbes, such as SARS-CoV-2, the virus which causes COVID-19.

“The main goal is to estimate the tree—and other parameters—given the observed genomic data,” says Molloy, who will also hold an appointment in the University of Maryland Institute for Advanced Computer Studies (UMIACS). “The resulting phylogeny is not only interesting in its own right, but it is also important for downstream analyses.”

For example, she notes that the phylogeny for SARS-CoV-2 is not only useful for studying how the virus evolves and how new strains emerge, but also for strain identification—that is, determining which strains are present in a sample—and even contact tracing.

Molloy recently completed a year-long postdoctoral researcher position in the Machine Learning and Genomics Lab at the University of California, Los Angeles.

She says she is looking forward to expanding her research agenda at Maryland, where she can take advantage of UMIACS’ vast computational resources.

A major line of Molloy’s research focuses on the development of phylogeny estimation methods that can effectively utilize distributed-memory systems.

In this context, she says, the genomic data set is distributed across multiple processors, and the algorithm may require these processors to communicate with each other.

“In the worst case, the processors must synchronize with each other at specific points in the algorithm, Molloy explains. “All of this dramatically slows down the computation. My goal is to design methods that reduce communication bottlenecks, while achieving the same accuracy and statistical guarantees of existing methods.”

Molloy says she looks forward to working with graduate students and faculty at Maryland, particularly within the Center for Bioinformatics and Computational Biology.

She has previously collaborated with Mihai Pop, the director of UMIACS, on a project that utilizes estimated phylogenies to perform taxon identification and abundance profiling from metagenomics data sets.

“I hope to continue working with Mihai on problems in metagenomics, where modeling evolutionary processes could prove advantageous,” Molloy says.

Other UMIACS faculty she expects to collaborate with include Brantley Hall, who works on identifying the functions of the genes in the microbiome, and Michael Cummings, who also approaches phylogeny estimation from a high-performance computing lens.

Although Cummings and Molloy utilize different methodologies for phylogeny estimation, Molloy says working together could lead to new approaches.

“There is a lot of potential for creating a hybrid method that combines aspects of our different approaches,” she says. “I look forward to discussing these ideas with my new colleagues at the University of Maryland.”

—Story by Melissa Brachfeld

CBCB Researchers Develop Tool that Makes Reconstructing Microbial Genomes Easier

Apr 30, 2021

In the seafaring world, a binnacle is a wooden stand placed near the ship’s helm that holds important tools and instruments needed to navigate from one point to the next.

At the University of Maryland, researchers in the Center for Bioinformatics and Computational Biology (CBCB) have developed their own tool—appropriately called Binnacle—that can help scientists navigate the complex world of microbial genomes.

This open-source software was described in a paper, “Binnacle: Using Scaffolds to Improve the Contiguity and Quality of Metagenomic Bins,” that was recently published in the online journal Frontiers in Microbiology.

The paper was written by Harihara Subrahmaniam Muralidharan (lead author), a third-year computer science doctoral student; Nidhi Shah (co-lead author) who just defended her doctoral dissertation; Jacquelyn Meisel, an assistant research scientist in the University of Maryland Institute for Advanced Computer Studies (UMIACS); and Mihai Pop, a professor of computer science and the director of UMIACS.

The CBCB team begins with the premise that recent advances in high-throughput sequencing strategies have spurred microbiome research and revealed important insights into the microbial communities that inhabit human, animal and environmental habitats.

In particular, whole metagenomic shotgun sequencing—which allows for a comprehensive analysis of microbial DNA from a small-sized sample—has been instrumental in expanding an understanding of the functional potential and genetic composition of different microorganisms that have not been previously cultured.

The challenge, though, is that reconstructing complete genomes of organisms from whole metagenomic shotgun sequencing data is often difficult and time-consuming.

Recovered genomes are often highly fragmented, the researchers say, due to uneven abundances of organisms, repeats within and across genomes, sequencing errors, and strain-level variation. To address the fragmented nature of metagenomic assemblies, scientists rely on a process called binning, which clusters together contigs—a set of overlapping DNA segments—inferred to originate from the same organism.

Existing binning algorithms use oligonucleotide frequencies and contig abundance (coverage) within and across samples to group together contigs from the same organism. However, these algorithms often miss short contigs and contigs from regions with unusual coverage or DNA composition characteristics.

The CBCB researchers propose that information from assembly graphs—used to represent the final assembly of a genome or metagenomes—can assist current strategies for metagenomic binning. They use a metagenomic scaffolding tool, called MetaCarvel, to construct assembly graphs where contigs are nodes and edges are inferred based on paired-end reads.

The Binnacle software is then able to extract information from the assembly graphs and subsequently cluster scaffolds into comprehensive bins.

The CBCB researchers show that binning graph-based scaffolds, rather than contigs, improves the contiguity and quality of the resulting bins, and captures a broader set of the genes of the organisms being reconstructed.

The authors believe that their Binnacle software represents a first step toward the development of effective metagenomic analysis tools that can leverage all the information contained in one or more samples. Ultimately, this could lead to the automated reconstruction of a metagenome-assembled genome, opening new pathways for accurate and efficient discoveries in public-health microbiology and other fields.

The research described in the published paper is supported by grants from the National Institutes of Health and the National Science Foundation.

—Story by Melissa Brachfeld

Talk Nerdy Podcast Features Rita Colwell Discussing Sexism and Science

Tue Mar 30, 2021

Rita Colwell, a Distinguished University Professor in the University of Maryland Institute for Advanced Computer Studies, recently discussed her storied career as an infectious disease scientist on the Talk Nerdy podcast hosted by Cara Santa Maria.

Colwell—still active in research, including multiple international efforts to stem the spread of cholera—talked candidly about the barriers she faced throughout her career as a woman scientist.

Go here to listen to the entire 60-minute podcast.

El-Sayed to Lead New Genomic Sequencing Facility at UMD

Mar 24, 2021

Najib El-Sayed, a professor of cell biology and molecular genetics with a joint appointment in the University of Maryland Institute for Advanced Computer Studies, has been named director of a new campus facility that will enable researchers to access single-cell DNA and RNA sequencing technologies.

The Advanced Genomic Technologies Core (BBI-AGTC), part of the Brain and Behavior Institute (BBI), will advance high-throughput and advanced large-scale genomics research, and will greatly improve the university’s capabilities in neuroscience and other life sciences, says Elizabeth Quinlan, a professor of biology and director of the BBI.

The BBI-AGTC is set to open in mid-April 2021 and will house one of the latest sequencing platforms (Illumina NextSeq 1000), a single-cell controller (10X Genomics Chromium), a liquid handling robotic station (Eppendorf epMotion 5075tc), a real-time PCR instrument and a microfluidics-based platform for the sizing, quantification and quality control of DNA and RNA.

“Next generation sequencing was a big advance toward understanding genomes and RNA expressed in different tissues, and the BBI-AGTC takes that technology a step further by enabling transcriptomes of individual cells,” says Karen Carleton, a professor of biology: “More broadly, the facility could be important for answering a diverse set of biological questions including how cells change through development, infection, or even in response to different behavioral states such as during mating or parental care.”

El-Sayed’s own research uses genomic approaches to study the biology of parasitism. His team develops and applies molecular, computational and systems biology tools to better understand host-pathogen interactions and, ultimately, the mechanisms of infection and survival. His work in functional genomics, tracing how biological information flows from gene to protein expression via RNA transcription, looks to contribute to better diagnosis, prevention of and therapeutics for parasite- and bacteria-caused diseases in humans, animals and plants.

El-Sayed has published more than 90 research papers and serves on the editorial boards of BMC Genomics and PLoS One. He also serves on the National Institute of Health’s Genomics, Computational Biology and Technology study section and the scientific advisory board for the National Institute of Allergy and Infectious Diseases’ Genomic Centers for Infectious Diseases.

Former CBCB Graduate Student Receives Larry S. Davis Doctoral Dissertation Award

Dec 15, 2020

A former graduate student in the Center for Bioinformatics and Computational Biology (CBCB) has been recognized for the excellence of her academic work on developing practical and efficient solutions to index large collections of genomes.

Fatemeh Almodaresi, who graduated with a Ph.D. in computer science in Summer 2020, is one of two students to receive the Larry S. Davis Doctoral Dissertation Award this year.

The annual award recognizes outstanding doctoral dissertations in the Department of Computer Science that convey excellence in their technical depth, significance, potential impact and presentation quality.

The award is named for Larry Davis, a Distinguished University Professor of computer science who served as chair of the department from 1999–2012. Davis was also the founding director of the University of Maryland Institute for Advanced Computer Studies (UMIACS), providing leadership for the institute from 1985–1994.

Her UMD dissertation, “Algorithms and Data Structures Indexing, Querying, and Analyzing Large Collections of Sequencing Data in the Presence or Absence of a Reference,” covers the development of new data structures as well as innovative, practical and efficient solutions to indexing large collections of genomes.

“Fatemeh demonstrates both technical brilliance and the ability to come up with ideas that are theoretically interesting and of tremendous practical impact,” says Rob Patro, an associate professor of computer science who advised Almodaresi during her doctoral studies.

In recommending her for the Davis dissertation award, Patro noted Almodaresi’s design of a new compacted version of a De Bruijn graph, a data structure that is used in bioinformatics to assemble and analyze genomes. Although similar structures have previously been proposed, Almodaresi’s work greatly improved the practicality and efficiency in important ways, Patro explains.

In recommending her for the dissertation award, Patro noted Almodaresi’s design of a new compacted version of a De Bruijn graph, a data structure that is used in bioinformatics to assemble and analyze genomes. Although similar structures have previously been proposed, Almodaresi’s work greatly improved the practicality and efficiency in important ways, Patro explains.

Almodaresi also built a tool using her new data structure for the taxonomic assignment of metagenomic read data, a well-studied problem in the field of computational biology. The taxonomic classifier she designed is both more accurate and more efficient—requiring less memory while operating at a similar speed—than two popular tools that are widely-used to accomplish the same task, Patro says.

One major goal in the field is to build practical indexes on the hundreds of thousands of sequencing samples available in public data repositories. In pursuit of this idea, Almodaresi developed new methods and solutions to perform large-scale search of raw sequencing data using a system called Mantis. Her methods allow the index to be iteratively updated–a crucial feature as the amount of genomic data to be indexed continues to grow.

“I anticipate that much of the theory and methodology Fatemeh develops will have a far-reaching impact both within and beyond the field of genomics,” says Patro. “She is a truly expectational researcher—displaying an intelligence, commitment and real passion at a level that I find to be rare, even among top Ph.D. students.”

Story by Maria Herd

Dr. Rita Colwell Book "A Lab of One's Own - One Woman's Personal Journey Through Sexism in Science"

Wed Oct 07, 2020
A riveting memoir-manifesto from the first female director of the National Science Foundation about the entrenched sexism in science, the elaborate detours women have taken to bypass the problem, and how to fix the system. If you think sexism thrives only on Wall Street or in Hollywood, you haven’t visited a lab, a science department, a research foundation, or a biotech firm. Rita Colwell is one of the top scientists in America: the groundbreaking microbiologist who discovered how cholera survives between epidemics and the former head of the National Science Foundation. But when she first applied for a graduate fellowship in bacteriology, she was told, “We don’t waste fellowships on women.” A lack of support from some male superiors would lead her to change her area of study six times before completing her PhD. A Lab of One’s Own documents all Colwell has seen and heard over her six decades in science, from sexual harassment in the lab to obscure systems blocking women from leading professional organizations or publishing their work. Along the way, she encounters other women pushing back against the status quo, including a group at MIT who revolt when they discover their labs are a fraction of the size of their male colleagues’. Resistance gave female scientists special gifts: forced to change specialties so many times, they came to see things in a more interdisciplinary way, which turned out to be key to making new discoveries in the twentieth and twenty-first centuries. Colwell would also witness the advances that could be made when men and women worked together—often under her direction, such as when she headed a team that helped to uncover the source of the anthrax used in the 2001 letter attacks. A Lab of One’s Own shares the sheer joy a scientist feels when moving toward a breakthrough, and the thrill of uncovering a whole new generation of female pioneers. But it is also the science book for the #MeToo era, offering an astute diagnosis of how to fix the problem of sexism in science—and a celebration of the women pushing back.

Understanding How the Gut Microbiome Influences Human Health

Sep 17, 2020

Living inside of every person are trillions of microorganisms—bacteria, viruses, fungi and other life forms that are collectively known as the microbiome. While they are miniscule in size, this collection of microbes has a big impact on our health.

Brantley Hall, who joined the University of Maryland on July 1 as an assistant professor in the Department of Cell Biology and Molecular Genetics, is particularly interested in the microbiome of the human gut, the region of the gastrointestinal tract that digests food, absorbs nutrients and expels waste.

Based on years of effort by numerous researchers focused on the microbiome, Hall says that scientists have collectively accumulated an acceptable understanding about the species of microbes that inhabit the gastrointestinal tracts of people in the United States.

The next step, he says—and one that will be the focus of his new lab at UMD— is to genetically interrogate the functions that these gut microbes perform, and how those functions interact, both positively and negatively, with human health.

“This is an enormous undertaking,” Hall explains. “There are millions of gut microbial genes lacking functional annotation, and those with functional annotations are of questionable quality.”

Hall says his lab is working to fill that gap by improving functional annotation across gut microbiome species, combining computational and wet-lab techniques to identify genes responsible for health-relevant functions, and developing new strategies to measure the metabolic output of the gut microbiome.

“Our long-term goal is to rationally modulate the human gut microbiome to improve human health,” he says.

Hall, who also has an appointment in the University of Maryland Institute for Advanced Computer Studies (UMIACS), says that coming to UMD seemed like a “very good fit.”

“I was excited about the opportunity to work with researchers at Maryland,” he says. “There is an enormous value in being surrounded by like-minded computational biologists.”

Michael Cummings, a professor of biology and director of Center for Bioinformatics and Computational Biology (CBCB)—one of five major centers that are part of UMIACS—says that Hall is a “very welcome addition” to the Maryland campus.

“Brantley’s interesting and important gut microbiome research is complementary to other research in CBCB, particularly that of [UMIACS director] Mihai Pop,” Cummings says.

Cummings adds that while still early in his career, Hall has proven to be extremely productive as evidenced by his publication record, noting that his research is also impactful, as evidenced by the citations his publications have garnered.

Hall says he has already learned a great deal from his interactions with researchers at CBCB. He cites recent collaborations with Pop and Rob Patro, an associate professor of computer science, as examples.

“By co-advising a graduate student with Mihai, I have been able to draw on his extensive expertise in metagenomic assembly to improve our computational experiments,” Hall says. “And close interactions with genomic software developers such as Rob gives us a better understanding of the state-of-the-art techniques in tool development.”

He adds that UMIACS’ computational resources have already proven beneficial to his research. Specifically, he says his lab is analyzing thousands of metagenomes, which takes a great amount of space to store and a lot of CPU cores to analyze.

“When I was interviewing at UMD, several professors emphasized how great the computing facilities were, and I can now attest that this is true,” Hall says. “Setting up and maintaining these large computing clusters isn't always easy to pull off, and I am grateful for the UMIACS tech support on both the hardware and software.”

—Story by Melissa Brachfeld

Patro Works to Improve Search Functionality for Large Repositories of Sequencing Data

Sep 02, 2020

Genomic sequencing data can often shed light on a wide array of scientific problems—from treating patients with heart disease and cancer to understanding how certain pathogens can affect plants and animals.

Public repositories of genomic data are becoming more commonplace and are growing at an exponential rate. The National Center for Biotechnology Information (NCBI), for example, runs the Sequence Read Archive (SRA), a repository that holds raw scientific data for a vast number of scientific experiments conducted using high-throughput sequencing data.

While repositories like the SRA are a boon to researchers, the ability to quickly search these large public databases for a specific sequence of interest is limited.

To that end, Rob Patro (in photo), an associate professor of computer science, is leading an effort to develop fundamentally improved data structures and algorithms to enable large-scale sequence search across public repositories of genomic data.

Patro is collaborating on this work with Mike Ferdman and Michael Bender, both at Stony Brook University, and Rob Johnson, a senior staff researcher at VMWare Research group.

Patro says that making these types of databases easier to search can help researchers get to the bottom of unique problems.

“Imagine a scientist who discovers what they imagine to be a novel environmental pathogen that, perhaps, has a negative effect on plant or animal health,” he says. “It may be very useful to know what other previously-collected data might have contained this pathogen, but its presence was not reported because the scientists working on the previous studies were focused on a different question and did not take note of this new pathogen. Combing through the repository of available data might help us learn more about this pathogen.”

Patro says that the technical challenge in addressing such a problem is simply the scale of the data. The SRA, for example, currently contains about 16 petabytes of data.

Sequencing data is quite different from other domains where similar challenges have been tackled, he adds.

“DNA and RNA do not have a well-defined vocabulary of words or ‘tokens’ into which the data can be broken down,” he says. “So it’s very different from a typical internet search query.”

Patro’s research group has recently been working on data structures and algorithms that let them scale sequence indices to increasingly large sets of experiments, while enabling very fast (approaching interactive) queries.

A recent paper they authored introduced a fundamentally better scheme for storing and retrieving a key component of this information than their previous attempts. (Note: the lead author on the paper was Fatemeh Almodaresi, who just completed her doctoral degree at UMD with Patro as her adviser.)

The researchers discovered that by examining the patterns of occurrence of small pieces of DNA in the repository—and compressing similar, rather than just identical, patterns in a relative manner—they could reduce the size of this component of the data structure by over an order of magnitude, without slowing the query speed, and, in some cases, speeding it up.

“A main technical challenge here is how one can efficiently search for ‘similar’ patterns when each pattern consists of a point in a very high dimensional space,” Patro says. “While initial attempts to solve this problem using a technique known as locality-sensitive hashing seemed not to work well, we were able to exploit the inherent structure of the data, and take advantage of a widely-used structure in genomics—known as the De Bruijn graph—to develop an efficient search structure to find ‘similar’ patterns efficiently.”

Patro’s group continues to work on this problem, providing important improvements to the indexing methodology. Additionally, they have redesigned the largest part of the index, which was previously required to be in the Random Access Memory (RAM) of the computer to be intelligently partitioned. This allows them to load only small parts of the index at a time, vastly reducing the amount of memory required to both build and query the index.

Finally, they are developing a distributed variant of this index that can be partitioned across a network of machines, allowing the query procedure to be scaled up simply by adding more machines to the network and improving robustness.

—Story by Melissa Brachfeld

***

Note: Patro was recently promoted to associate professor. In addition to his tenure appointment, he has an appointment in the University of Maryland Institute for Advanced Computer Studies, where he is a member of the Center for Bioinformatics and Computational Biology.

El-Sayed Named Director of Honors College Integrated Life Sciences Program

Wed Jul 08, 2020

Najib El-Sayed, a professor of cell biology and molecular genetics, has been named director of the Integrated Life Sciences (ILS) program at the University of Maryland.

The program, part of the university’s Honors College, offers talented undergraduates a biologically-inspired living and learning experience that includes a significant research requirement.

Many ILS graduates go on to graduate school and careers in medicine (both human and veterinary), public health, dentistry, and academia.

El-Sayed, who has an appointment in the University of Maryland Institute for Advanced Computer Studies, is an expert on the biology of parasitism and host-pathogen interactions using genomic and bioinformatics approaches.

He is the author of more than 80 research articles and book chapters and was designated as a highly cited investigator in microbiology by Thomson Reuters.

Go here to see an overview of El-Sayed’s research interests.

Pages

Subscribe to All News