All News

Efficiently Processing Single-Cell and Single-Nucleus RNA-Sequencing Data

Mar 29, 2022

Caption: Associate Professor Rob Patro (left in photo) and Ph.D. student Dongze He (right) have released a toolkit for the efficient processing of single-cell and single-nucleus RNA sequencing data.

Rapid improvements in cell sequencing technologies in the last decade have provided clinicians and scientists with many valuable insights—from better treatment options for patients with heart disease and cancer to a much deeper understanding of how certain pathogens can affect plants and animals.

In particular, the exponential growth of high-throughput single-cell and single-nucleus RNA-sequencing technologies (collectively, single-cell transcriptomics technologies) have produced a wealth of new data. In fact, single-cell transcriptomic data constitutes the most ubiquitous components of single-cell multi-omics data, which was selected as the “2019 Technology of the Year” by the journal Nature Methods.

These technologies enable scientists to measure gene expression at the resolution of individual cells for tens or even hundreds of thousands of cells at a time. The measured gene expression can act as a crucial signal in understanding biological processes, disease progression, and even informing potential patient treatment options.

The result of this unprecedented resolution is that one can infer gene expression changes in all kinds of interesting biological contexts: How does gene expression differ between cells that respond to a drug versus those that are treatment resistant? How does gene expression differ among closely related cell types that happen to inhabit the same tissues within the body?

Single-cell sequencing has been a revolutionary tool in answering these kinds of questions.

But scientists must first “pre-process” this RNA-sequencing data—a crucial step that involves going from the raw sequencing data to a specific count of how abundant each gene is within each cell. And while there is popular commercial software available to accomplish this task, it is both time-consuming and memory intensive, as well as closed source.

Now, a multi-institutional team of researchers—including four with ties to the University of Maryland—has developed an accurate, computationally efficient, and lightweight toolkit for processing large amounts of raw single-cell and single-nucleus RNA sequencing data.

Their free suite of tools, called alevin-fry, is detailed in a paper published March 10 in Nature Methods.

“As the number and scale of single-cell, including single-nucleus, RNA-sequencing experiments grow, so do the costs associated with the processing of this data,” says Dongze He, a third-year doctoral student in computational biology at UMD and lead author on the paper. “Alevin-fry provides researchers an accurate, flexible and convenient way to process a multitude of types of single-cell data, simplifying and speeding up analysis and reducing computational costs of various single-cell related scientific activities.”

Instead of hours of processing time often requiring server-scale computers with large amounts of memory, the researchers say that their open-source toolset can process very large sets of single-cell data in only tens of minutes, using amounts of processing power and memory that is commonly available on commodity desktops and laptops, while retaining accurate results.

This exciting advancement is tied to a series of lightweight algorithms and efficient data structures, as well as a highly tuned implementation that can effectively make use of many processing threads at the same time. Applied in unison, these allow indexing a large amount of reference sequence—critical parts of the underlying genome—in small space, and quickly and accurately inferring the gene from which each sequencing read was generated.

The alevin-fry toolkit applies these lightweight approaches in a way that provides accurate results and should allow computational analysis to keep pace with the quickly-advancing biotechnology, says Rob Patro, an associate professor of computer science with an appointment in the University of Maryland Institute for Advanced Computer Studies.

“We think our software provides a compelling option for scientists working with single-cell and single-nucleus RNA-seq data, by enabling accurate and flexible gene quantification at a low computational cost,” Patro says. “As we receive feedback and input from other scientists using this method, we expect our software suite to grow in capability and comprehensiveness. Ultimately, we want this to be available to anyone looking for a seamless, efficient tool for advancing new discoveries using single-cell data.”

Other researchers working on the project are Mohsen Zakeri, who earned his doctorate in computer science from UMD in 2021 and is now a postdoctoral researcher at Johns Hopkins University; Hirak Sarkar, who earned his doctorate in computer science from UMD in 2020 and is now a postdoctoral researcher at Harvard Medical School; Charlotte Soneson, a research associate at the Friedrich Miescher Institute for Biomedical Research in Switzerland; and Avi Srivastava, a postdoctoral researcher at the New York Genome Center.

—Story by Melissa Brachfeld

Cummings and Resnik Receive MPower Seed Funding

Mar 08, 2022

Two faculty members in the University of Maryland Institute for Advanced Computer Studies (UMIACS) have received seed funding to advance interdisciplinary research focused on Parkinson’s disease, hearing disorders and antibiotic-resistant bacteria.

Michael Cummings (left in photo), a professor of biology, and Philip Resnik (right), a professor of linguistics, are active in three of the 17 projects recently chosen to split $3 million in seed funding from the University of Maryland Strategic Partnership: MPowering the State, known as MPower.

The MPower initiative was launched in 2012 to foster research, scholarship and innovation between the state’s top public research institutions, the University of Maryland, College Park, and the University of Maryland, Baltimore.

MPower’s competitive seed grant program—there were 52 proposals submitted this year—offers teams of faculty researchers grants of between $49,000 to $250,000. The money is intended to jumpstart high-impact research in areas that are of critical importance to the state of Maryland and the nation.

Cummings is principal investigator of a project that will use machine learning algorithms to help analyze mobility data from people suffering symptoms of Parkinson’s disease, a progressive nervous system disorder that affects nearly one million people in the United States.

He is joined on the project by Rainer von Coelln, M.D., an assistant professor of neurology at the University of Maryland School of Medicine.

The cross-institutional team will assess the severity of symptoms from 300 Parkinson’s disease patients in comparison to 50 control subjects who do not have Parkinson’s. All participants will wear sensors able to quantify and analyze their movements in their day-to-day activities or in specialized tasks they are asked to perform. Data to be analyzed includes symptoms like slowness, stiffness and shaking, mental health issues like anxiety/depression, and so-called autonomic symptoms like bladder dysfunction and dizziness.

The severity of these symptoms—and their rate of progression—often varies between patients. The captured sensor data analyzed by sophisticated algorithms could help clinicians quickly identify which Parkinson’s patients need more aggressive treatment protocols to help prevent their rapid deterioration.

Cummings is also involved in a second MPower-funded project being led by Matthew Goupell, a professor of hearing and speech sciences at UMD, and Ronna Hertzano, M.D., an otolaryngologist surgeon-scientist at the University of Maryland School of Medicine.

In this project, the researchers are developing innovative tools to effectively query and analyze hearing loss data for research purposes.

They plan to adapt a cloud-based tool developed at University of Maryland to support audiological clinical research data visualization and analysis. Ultimately, this could result in an advanced and intuitive clinical informatics tool that would assist in the implementation of multi-site clinical studies for hearing disorders.

Resnik is collaborating on an MPower seed grant with Katherine E. Goodman, an assistant professor of epidemiology and public health at the University of Maryland School of Medicine.

Their joint UMD/UMB team is interested in addressing the spread of antibiotic-resistant bacteria that can pose a grave challenge in U.S. hospitals.

One concern, the researchers say, is that colonized (i.e., “silent” carrier) patients often go undetected, transmitting antibiotic-resistant bacteria to other patients. Moreover, the colonized patients themselves are at a significantly higher risk of developing antibiotic-resistant infections, where mortality rates can exceed 50%.

But wide-scale screening for antibiotic-resistant bacterial colonization remains impractical for most U.S. hospitals. As a result, hospitals can miss critical opportunities to identify colonized patients early, when it is still possible to prevent what are often devastating patient outcomes.

The MPower-funded team is pursuing an innovative strategy: using state-of-the-art natural language processing and machine learning techniques to analyze the language in electronic health records, automatically detecting pre-admission exposures that might be found in a patient’s clinical notes.

Those notes—showing someone arriving at the hospital from a nursing home, for example—can often yield important information on strong risk factors for carrying antibiotic-resistant bacteria.

The research team aims to lay foundations for automated technology that will detect these high-risk patients in a much more targeted and cost-effective way than is currently available.

This is the second round of MPower seed funding for Resnik, who in 2016 was awarded a grant with UMB Professor of Psychiatry Deanna Kelly to develop computational models that help identify symptomatic changes in people suffering from schizophrenia or depression. The 2016 MPower grant was followed by an $842,000 National Science Foundation award in 2021.

Cummings Receives BBI Seed Funding to Study Hearing Loss Using Machine Learning

Jan 12, 2022

A University of Maryland computational biologist has received seed funding for an interdisciplinary project that uses machine learning to help untangle the myriad of causes behind hearing loss.

Michael Cummings, a professor of biology with an appointment in the University of Maryland Institute for Advanced Computer Studies, is co-PI on a $150,000 grant from the UMD Brain and Behavior Institute (BBI).

The project was one of five that received BBI seed funding this year as part of an interdisciplinary program focused on generating novel tools and approaches to understand complex behaviors produced by the human brain.

Cummings, who is also director of the Center for Bioinformatics and Computational Biology, will collaborate on the project with Matthew Goupell, a professor in the Department of Hearing and Speech Sciences and director of the Auditory Perception and Modeling Lab.

“I’m thankful for this opportunity to further our understanding of age-related hearing loss, and for this seed grant that makes our initial collaboration with the Goupell lab possible," Cummings says.

The UMD team is focused on the causes of sensory and cognitive impairment that can rapidly increase with age.

Up to 40 percent of individuals older than 70 suffer from hearing and vision loss, Cummings says, and without the ability to communicate effectively, can become reclusive, frustrated and depressed.

In addition, about 25 percent of Americans older than 65 have some mild cognitive decline and 10 percent have Alzheimer’s disease.

Although each of these impairments have been studied individually, much less is known about the relationship between hearing, mild cognitive impairment, and dementia due to Alzheimer’s disease or other causes, except that there is growing evidence of a link.

To understand how aging affects our sensory and cognitive systems, the UMD team believes it imperative to decipher the joint role of hearing and speech understanding ability, the neural mechanisms that contribute to age-related declines in these areas, and what role hearing plays in age-related cognitive impairment and dementia due to Alzheimer’s disease or other causes.

Their study will use machine learning on a substantial existing database of behavioral, clinical and cognitive data that have been previously collected to generate results for subsequent research.

“The problem is very complex when you consider the relationship between aging, sensation and cognition in individual people,” Goupell says. “While we have known for years that there is a correlation between hearing loss and cognitive decline, we have not made much progress on understanding why this occurs. Machine learning will help guide us to the most probable answers, particularly ones that we may not have considered yet, which will allow us to design future studies with strong testable hypotheses.”

—Story by Melissa Brachfeld

Patro Part of Team Honored with Allen Newell Award for Research Excellence

Oct 20, 2021

Rob Patro, an associate professor of computer science at the University of Maryland, was part of a team recently honored with the prestigious Allen Newell Award for Research Excellence.

The award is from Carnegie Mellon University’s (CMU) School of Computer Science. It recognizes outstanding work from current or former CMU researchers that epitomizes the scientific philosophy of Allen Newell, a computer scientist and pioneer in the field of artificial intelligence who died in 1992.

Newell firmly believed that “good science responds to real phenomena or real problems.”

Patro, who also has an appointment in the University of Maryland Institute for Advanced Computer Studies, was recognized with the Newell award for work he did as a postdoctoral researcher at CMU from 2012–2014.

He played a key role as part of a team that developed “Sailfish: Rapid Alignment-free Quantification of Isoform Abundance,” an open-source software tool for the quantification of gene expression from RNA sequencing data; the paper was published in 2014.

Sailfish implements an efficient method for quantifying isoform abundance from RNA sequencing data that elides the traditional and computationally intensive step of sequence alignment, and that makes use of an efficient accelerated expectation-maximization procedure over a reduced representation of the data (counts of equivalence classes).

These methodological advancements allowed Sailfish to quantify isoform abundance over an order of magnitude faster than existing methods. For example, allowing the estimation of abundances from a dataset consisting of 150 million sequencing reads in 15 minutes, where prior tools took up to six hours.

Isoforms are RNA-molecules that arise from the same gene (genomic locus), but which differ in their specific sequence for a number of reasons (including alternative splicing, making use of different transcript start sites, or other reasons).

The expression of different isoforms may sometimes result in distinct or altered gene function within the cell. Estimating isoform abundance lets one quantify the gene expression in a biological sample, but also lets one explore the relative abundance of these different forms of the gene.

Patro collaborated on the Sailfish project with Carl Kingsford, the Herbert A. Simon Professor of Computer Science at CMU, and Steve Mount, an associate professor of biology at the University of Maryland.

The trio were formally recognized with the Newell award at CMU’s Founders Day event, held this year in August in an online-only format.

—Story by Melissa Brachfeld

CBCB Researcher Jamshed Khan Wins Outstanding Student Paper Award

Aug 06, 2021

Jamshed Khan, a third-year computer science doctoral student, recently received the Ian Lawson Van Toch Memorial Award for Outstanding Student Paper at the 2021 Conference on Intelligent Systems for Molecular Biology (ISMB) and European Conference on Computational Biology (ECCB).

ISMB is the flagship meeting of the International Society for Computational Biology (ISCB). The event is co-located with ECCB and has grown to become the world’s largest bioinformatics and computational biology conference. The conference was held virtually July 25–30 due to the ongoing COVID-19 pandemic.

The award went to “Cuttlefish: Fast, parallel, and low-memory compaction of de Bruijn graphs from large-scale genome collections,” authored by Khan and his adviser Rob Patro, an associate professor of computer science with an appointment in the University of Maryland Institute for Advanced Computer Studies (UMIACS). The paper introduces a new algorithm for efficiently building a compacted de Bruijn graph from collections of input genome or transcriptome sequences.

The de Bruijn graph and its colored compacted variant has become an increasingly useful data structure in genomic analysis. The structure is used to assemble new genomes, compare the genomes of related organisms, and build efficient indexes for mapping or aligning the data from sequencing experiments, among other tasks.

However, the process of building the compacted de Bruijn graph is a computational challenge, with many existing solutions requiring a lot of time and memory as the input becomes larger.

In the paper, Khan and Patro—who are both members of the Center for Bioinformatics and Computational Biology (CBCB)—present a highly scalable and very low-memory algorithm called Cuttlefish, to construct the compacted de Bruijn graph for collections of whole genome references.

The researchers say Cuttlefish considerably outperforms existing state-of-the-art approaches. It can build the compacted de Bruijn graph for 100 human genomes two and half times as fast and using less than a quarter of the memory compared to the next best tool performing the same task.

“By providing a faster and more memory-frugal algorithm for constructing this widely-used structure of the compacted de Bruijn graph, we anticipate that Cuttlefish will aid in some of the many downstream applications in which this graph is used, like comparative genomics, pangenome analysis, and the indexing of large collections of related genomes for sequence analysis,” Patro says.

The Ian Lawson Van Toch Memorial Award for Outstanding Student Paper is given to the student who presents the most thought-provoking or original paper at ISMB/ECCB, as judged by a panel of experts. The award, which is sponsored by the Princess Margaret Hospital Foundation, is given in memory of Ian Lawson Van Toch, a 23-year-old medical biophysics graduate student at the University of Toronto who passed away in August 2007.

Khan says he is honored to receive such a prestigious award.

“We are ecstatic for this recognition of our work,” he says. “I'm very grateful to Rob for his support and guidance along the way, and for being such a remarkable adviser and mentor. I'm also thankful to my talented and dedicated lab-mates for their critiques and feedback. We look forward to Cuttlefish paving the way toward more exciting results in high-throughput genomics research.”

—Story by Melissa Brachfeld

Colwell Part of New Study Identifying Gut Microbiome Changes Long Before Onset of Celiac Disease

Jul 18, 2021

A multi-institutional team of researchers has identified substantial microbial changes in the intestines of infants who are at-risk for celiac disease, a serious autoimmune condition in which the consumption of gluten leads to damage in the small intestine.

The researchers, including Rita Colwell, a Distinguished University Professor in the University of Maryland Institute for Advanced Computer Studies, used advanced genomic sequencing techniques to identify intestinal changes as early as 18 months before the onset of the autoimmune disease.

They uncovered distinct alterations in several species of microorganisms and molecular components of cells and tissues in children who developed celiac disease. These alterations were not seen in at-risk children who did not develop the disease.

Results of the study, published in July in the Proceedings of the National Academy of Sciences, could lead to more effective celiac disease treatments and prevention.

“These results indicate that the microbiome can be a powerful indicator for celiac disease,” said Colwell, who is active at UMD in the Center for Bioinformatics and Computational Biology. “It provides an early warning before symptoms develop, allowing early intervention. With analysis of the microbiome via stool samples, infants can be monitored, and it may be possible for alteration of the diet to be sufficient to treat or prevent the disease.”

The research team tracked the gut microbiota of 500 at-risk children from birth through age 10 as part of the MassGeneral Hospital for Children’s Celiac Disease, Genomic, Microbiome and Metabolomic (CDGEMM) study. They began collecting extensive blood and fecal samples along with environmental data on participants in 2014.

Using metagenomic analysis, the researchers linked microbial composition with function and highlight changes associated with either increased inflammatory processes or reduced inflammation. An important part of the body’s immune response, inflammation is a significant cause of celiac disease symptoms.

For the current paper, the team compared the gut microbiome of 10 infants from the CDGEMM study who went on to develop celiac disease with the gut microbiome of 10 infants from the study who did not develop the autoimmune condition. All 20 children were genetically predisposed to develop celiac disease.

“We found significant changes in the intestinal microbes, pathways and metabolites 18 months before disease onset, which was confirmed with positive lab tests,” said Maureen Leonard, M.D., lead author of the study and clinical director of the Center for Celiac Research and Treatment. “This was much earlier than we expected.”

The changes researchers found included increases in pro-inflammatory microorganisms and decreases in protective and anti-inflammatory microorganisms at various time points before onset of the disease.

Colwell said the study demonstrates the power of next-generation sequencing coupled with bioinformatics to detect these important changes.

According to another of the paper’s authors—Alessio Fasano, M.D., director of the Center for Celiac Research and Treatment—the approach used in the study will help researchers develop similar studies for the diagnosis and treatment of a variety of conditions in which the microbiome could play a pathogenic role.

If confirmed by larger datasets, these findings may represent specific therapeutic targets for disease interception and possible prevention of celiac disease onset through microbiome manipulation during the preclinical phase.

###

This story was adapted from text provided by MassGeneral Hospital for Children.

CBCB Doctoral Student Studies Changes in Neurons that Affect Mammals’ Sleep Cycles

Aug 02, 2021

Theresa Alexander, a graduate student in the Center for Bioinformatics and Computational Biology (CBCB), loves all kinds of creatures and spends her free time with her dog Fibonacci and riding her horses Ella, Spotty and Fergus.

As a fourth-year doctoral student in biological sciences, Alexander’s love of animals extends down to investigating very specific biological problems that affect them.

She earned her master’s degree in epidemiology and biostatistics with a focus on genetic biostatistics from Case Western Reserve University in 2017. Currently, she is co-advised by Najib El-Sayed, a professor of cell biology and molecular genetics with a joint appointment in the University of Maryland Institute for Advanced Computer Studies (UMIACS), and Colenso Speer, an assistant professor of biology.

Her most recent work focuses on investigating a specific neuronal cell type that syncs mammals' sleep/wake cycle—known as circadian rhythm—to the sun cycle. These specific cells in the brain—called intrinsically photosensitive retinal ganglion cells (ipRGCs)—have a protein that makes them photosensitive, meaning these neurons send action potentials from your eye to a specific part of the brain when light photons enter your eye.

Alexander says neurons have to package and ship mRNAs far distances to then be turned into proteins in these distal parts of the cell. Messenger ribonucleic acid—or mRNA for short—is a molecule that carries genetic code from DNA to the rest of the cell.

“One problem we are trying to understand is what mRNAs actually get shipped out to these far-away compartments in these specific neurons—i.e. how does the USPS sorting center in the neuron cell body decide which mRNAs get packaged up and sent out to the processing centers in the ‘rural’ parts of the cell,” she explains.

The second major problem involves looking at the heterogeneity between different neuronal cell-types and sub-cell types. IpRGCs can be split into six different “sub-cell” types. Each of these cell types goes from the retina to different brain targets to perform different functions—some help us maintain our sleep-wake cycles in coordination with the sun cycle while others have image-forming visual functions.

“What we want to help shed light on is how these cells differ from other subtypes based on which genes each expresses throughout development at different critical neuronal growth stages,” Alexander says.

Her group is currently studying ipRGCs in mice, but says she hopes that these results extend to ipRGC function in other mammals.

Alexander enjoys working in CBCB because of the sense of community and the wide array of research being conducted.

“It is absolutely amazing to get a room full of people from so many backgrounds and specialties together,” she says. “There's always a unique perspective to solve a problem, and if I am working on a new problem or an area which I am not familiar with, there's always someone to go to who has some experience with it.”

El-Sayed says Alexander contributes a “wealth of interdisciplinary knowledge” to his lab as well as CBCB.

“Theresa is well-grounded both in the life sciences and bioinformatics and brings a deep understanding of statistical inference to our analyses,” he says. “In addition to her impressive intellectual capacity, she is pleasant, positive and simply fun to work with.”

Alexander is also a UMD COMBINE (Computation and Mathematics for Biological Networks) fellow, a National Science Foundation-funded Research Traineeship (NRT) program in network biology.

Additionally, she volunteers for "Girls Talk Math,” a summer program for high school students that is run by UMD’s math department. The program exposes students to math topics that they may not come across in their normal curriculum. Alexander teaches network science.

After she earns her doctorate, Alexander envisions applying her data science skills to biotech research.

“The intersection of biology, computer science and statistics is the most exciting place to me, and goal-oriented industry research seems like a perfect place for that,” she says.

—Story by Melissa Brachfeld

Colwell Concludes a Decade of Work with Gulf of Mexico Research Initiative

Jun 17, 2021

Rita Colwell, a Distinguished University Professor in the University of Maryland Institute for Advanced Computer Studies, recently concluded a decade of work with the Gulf of Mexico Research Initiative (GoMRI), an independent group of experts formed in 2010 following the Deepwater Horizon explosion and oil spill.

Colwell served as the research board chair for GoMRI, leading a group of almost two-dozen senior scientists and public policy experts who provided guidance on a large-scale effort to investigate the impacts of oil, dispersed oil, and oil dispersants on the ecosystems of the Gulf of Mexico and affected coastal states.

The research board oversaw an unprecedented investigation into the effects of the Deepwater disaster—a multidisciplinary undertaking that would ultimately involve more than 4,500 people, including scientists, lab techs, data and outreach specialists, students, and countless others.

The goal of the initiative, officials say, was to improve society’s ability to understand, respond to and mitigate the impacts of petroleum pollution and related stressors of the marine and coastal ecosystems, with an emphasis on conditions found in the Gulf of Mexico.

A detailed accounting on the history, scope—and successful outcomes—of this 10-year effort was recently published in a special edition of Oceanography magazine. Go here to view a PDF that highlights the efforts by Colwell and others associated with the GoMRI project.

Molloy Is Designing Efficient Algorithms for Reconstructing Evolutionary Trees

Jun 09, 2021

Perhaps the most iconic image in evolutionary biology is Charles Darwin's sketch of an evolutionary tree. The illustration highlights Darwin’s transformational idea that the evolutionary relationships among species can be depicted through a branching pattern, a concept known as the Tree of Life.

While Darwin primarily relied on the physical characteristics shared by subsets of species to determine an evolutionary tree’s structure, scientists today are using vast amounts of genomic data to reconstruct evolutionary trees—a field known as phylogenetics.

Erin Molloy, who joins the University of Maryland on July 1 as an assistant professor in the Department of Computer Science, is part of this new genomic revolution.

She is using powerful computational tools to unlock the full breadth of information available in genomic data, designing efficient algorithms to estimate evolutionary trees. This type of information is vital for determining the evolutionary history of birds, plants and even microbes, such as SARS-CoV-2, the virus which causes COVID-19.

“The main goal is to estimate the tree—and other parameters—given the observed genomic data,” says Molloy, who will also hold an appointment in the University of Maryland Institute for Advanced Computer Studies (UMIACS). “The resulting phylogeny is not only interesting in its own right, but it is also important for downstream analyses.”

For example, she notes that the phylogeny for SARS-CoV-2 is not only useful for studying how the virus evolves and how new strains emerge, but also for strain identification—that is, determining which strains are present in a sample—and even contact tracing.

Molloy recently completed a year-long postdoctoral researcher position in the Machine Learning and Genomics Lab at the University of California, Los Angeles.

She says she is looking forward to expanding her research agenda at Maryland, where she can take advantage of UMIACS’ vast computational resources.

A major line of Molloy’s research focuses on the development of phylogeny estimation methods that can effectively utilize distributed-memory systems.

In this context, she says, the genomic data set is distributed across multiple processors, and the algorithm may require these processors to communicate with each other.

“In the worst case, the processors must synchronize with each other at specific points in the algorithm, Molloy explains. “All of this dramatically slows down the computation. My goal is to design methods that reduce communication bottlenecks, while achieving the same accuracy and statistical guarantees of existing methods.”

Molloy says she looks forward to working with graduate students and faculty at Maryland, particularly within the Center for Bioinformatics and Computational Biology.

She has previously collaborated with Mihai Pop, the director of UMIACS, on a project that utilizes estimated phylogenies to perform taxon identification and abundance profiling from metagenomics data sets.

“I hope to continue working with Mihai on problems in metagenomics, where modeling evolutionary processes could prove advantageous,” Molloy says.

Other UMIACS faculty she expects to collaborate with include Brantley Hall, who works on identifying the functions of the genes in the microbiome, and Michael Cummings, who also approaches phylogeny estimation from a high-performance computing lens.

Although Cummings and Molloy utilize different methodologies for phylogeny estimation, Molloy says working together could lead to new approaches.

“There is a lot of potential for creating a hybrid method that combines aspects of our different approaches,” she says. “I look forward to discussing these ideas with my new colleagues at the University of Maryland.”

—Story by Melissa Brachfeld

CBCB Researchers Develop Tool that Makes Reconstructing Microbial Genomes Easier

Apr 30, 2021

In the seafaring world, a binnacle is a wooden stand placed near the ship’s helm that holds important tools and instruments needed to navigate from one point to the next.

At the University of Maryland, researchers in the Center for Bioinformatics and Computational Biology (CBCB) have developed their own tool—appropriately called Binnacle—that can help scientists navigate the complex world of microbial genomes.

This open-source software was described in a paper, “Binnacle: Using Scaffolds to Improve the Contiguity and Quality of Metagenomic Bins,” that was recently published in the online journal Frontiers in Microbiology.

The paper was written by Harihara Subrahmaniam Muralidharan (lead author), a third-year computer science doctoral student; Nidhi Shah (co-lead author) who just defended her doctoral dissertation; Jacquelyn Meisel, an assistant research scientist in the University of Maryland Institute for Advanced Computer Studies (UMIACS); and Mihai Pop, a professor of computer science and the director of UMIACS.

The CBCB team begins with the premise that recent advances in high-throughput sequencing strategies have spurred microbiome research and revealed important insights into the microbial communities that inhabit human, animal and environmental habitats.

In particular, whole metagenomic shotgun sequencing—which allows for a comprehensive analysis of microbial DNA from a small-sized sample—has been instrumental in expanding an understanding of the functional potential and genetic composition of different microorganisms that have not been previously cultured.

The challenge, though, is that reconstructing complete genomes of organisms from whole metagenomic shotgun sequencing data is often difficult and time-consuming.

Recovered genomes are often highly fragmented, the researchers say, due to uneven abundances of organisms, repeats within and across genomes, sequencing errors, and strain-level variation. To address the fragmented nature of metagenomic assemblies, scientists rely on a process called binning, which clusters together contigs—a set of overlapping DNA segments—inferred to originate from the same organism.

Existing binning algorithms use oligonucleotide frequencies and contig abundance (coverage) within and across samples to group together contigs from the same organism. However, these algorithms often miss short contigs and contigs from regions with unusual coverage or DNA composition characteristics.

The CBCB researchers propose that information from assembly graphs—used to represent the final assembly of a genome or metagenomes—can assist current strategies for metagenomic binning. They use a metagenomic scaffolding tool, called MetaCarvel, to construct assembly graphs where contigs are nodes and edges are inferred based on paired-end reads.

The Binnacle software is then able to extract information from the assembly graphs and subsequently cluster scaffolds into comprehensive bins.

The CBCB researchers show that binning graph-based scaffolds, rather than contigs, improves the contiguity and quality of the resulting bins, and captures a broader set of the genes of the organisms being reconstructed.

The authors believe that their Binnacle software represents a first step toward the development of effective metagenomic analysis tools that can leverage all the information contained in one or more samples. Ultimately, this could lead to the automated reconstruction of a metagenome-assembled genome, opening new pathways for accurate and efficient discoveries in public-health microbiology and other fields.

The research described in the published paper is supported by grants from the National Institutes of Health and the National Science Foundation.

—Story by Melissa Brachfeld

Pages

Subscribe to All News