All News

Patro Part of Team Honored with Allen Newell Award for Research Excellence

Oct 20, 2021

Rob Patro, an associate professor of computer science at the University of Maryland, was part of a team recently honored with the prestigious Allen Newell Award for Research Excellence.

The award is from Carnegie Mellon University’s (CMU) School of Computer Science. It recognizes outstanding work from current or former CMU researchers that epitomizes the scientific philosophy of Allen Newell, a computer scientist and pioneer in the field of artificial intelligence who died in 1992.

Newell firmly believed that “good science responds to real phenomena or real problems.”

Patro, who also has an appointment in the University of Maryland Institute for Advanced Computer Studies, was recognized with the Newell award for work he did as a postdoctoral researcher at CMU from 2012–2014.

He played a key role as part of a team that developed “Sailfish: Rapid Alignment-free Quantification of Isoform Abundance,” an open-source software tool for the quantification of gene expression from RNA sequencing data; the paper was published in 2014.

Sailfish implements an efficient method for quantifying isoform abundance from RNA sequencing data that elides the traditional and computationally intensive step of sequence alignment, and that makes use of an efficient accelerated expectation-maximization procedure over a reduced representation of the data (counts of equivalence classes).

These methodological advancements allowed Sailfish to quantify isoform abundance over an order of magnitude faster than existing methods. For example, allowing the estimation of abundances from a dataset consisting of 150 million sequencing reads in 15 minutes, where prior tools took up to six hours.

Isoforms are RNA-molecules that arise from the same gene (genomic locus), but which differ in their specific sequence for a number of reasons (including alternative splicing, making use of different transcript start sites, or other reasons).

The expression of different isoforms may sometimes result in distinct or altered gene function within the cell. Estimating isoform abundance lets one quantify the gene expression in a biological sample, but also lets one explore the relative abundance of these different forms of the gene.

Patro collaborated on the Sailfish project with Carl Kingsford, the Herbert A. Simon Professor of Computer Science at CMU, and Steve Mount, an associate professor of biology at the University of Maryland.

The trio were formally recognized with the Newell award at CMU’s Founders Day event, held this year in August in an online-only format.

—Story by Melissa Brachfeld

CBCB Researcher Jamshed Khan Wins Outstanding Student Paper Award

Aug 06, 2021

Jamshed Khan, a third-year computer science doctoral student, recently received the Ian Lawson Van Toch Memorial Award for Outstanding Student Paper at the 2021 Conference on Intelligent Systems for Molecular Biology (ISMB) and European Conference on Computational Biology (ECCB).

ISMB is the flagship meeting of the International Society for Computational Biology (ISCB). The event is co-located with ECCB and has grown to become the world’s largest bioinformatics and computational biology conference. The conference was held virtually July 25–30 due to the ongoing COVID-19 pandemic.

The award went to “Cuttlefish: Fast, parallel, and low-memory compaction of de Bruijn graphs from large-scale genome collections,” authored by Khan and his adviser Rob Patro, an associate professor of computer science with an appointment in the University of Maryland Institute for Advanced Computer Studies (UMIACS). The paper introduces a new algorithm for efficiently building a compacted de Bruijn graph from collections of input genome or transcriptome sequences.

The de Bruijn graph and its colored compacted variant has become an increasingly useful data structure in genomic analysis. The structure is used to assemble new genomes, compare the genomes of related organisms, and build efficient indexes for mapping or aligning the data from sequencing experiments, among other tasks.

However, the process of building the compacted de Bruijn graph is a computational challenge, with many existing solutions requiring a lot of time and memory as the input becomes larger.

In the paper, Khan and Patro—who are both members of the Center for Bioinformatics and Computational Biology (CBCB)—present a highly scalable and very low-memory algorithm called Cuttlefish, to construct the compacted de Bruijn graph for collections of whole genome references.

The researchers say Cuttlefish considerably outperforms existing state-of-the-art approaches. It can build the compacted de Bruijn graph for 100 human genomes two and half times as fast and using less than a quarter of the memory compared to the next best tool performing the same task.

“By providing a faster and more memory-frugal algorithm for constructing this widely-used structure of the compacted de Bruijn graph, we anticipate that Cuttlefish will aid in some of the many downstream applications in which this graph is used, like comparative genomics, pangenome analysis, and the indexing of large collections of related genomes for sequence analysis,” Patro says.

The Ian Lawson Van Toch Memorial Award for Outstanding Student Paper is given to the student who presents the most thought-provoking or original paper at ISMB/ECCB, as judged by a panel of experts. The award, which is sponsored by the Princess Margaret Hospital Foundation, is given in memory of Ian Lawson Van Toch, a 23-year-old medical biophysics graduate student at the University of Toronto who passed away in August 2007.

Khan says he is honored to receive such a prestigious award.

“We are ecstatic for this recognition of our work,” he says. “I'm very grateful to Rob for his support and guidance along the way, and for being such a remarkable adviser and mentor. I'm also thankful to my talented and dedicated lab-mates for their critiques and feedback. We look forward to Cuttlefish paving the way toward more exciting results in high-throughput genomics research.”

—Story by Melissa Brachfeld

Colwell Part of New Study Identifying Gut Microbiome Changes Long Before Onset of Celiac Disease

Jul 18, 2021

A multi-institutional team of researchers has identified substantial microbial changes in the intestines of infants who are at-risk for celiac disease, a serious autoimmune condition in which the consumption of gluten leads to damage in the small intestine.

The researchers, including Rita Colwell, a Distinguished University Professor in the University of Maryland Institute for Advanced Computer Studies, used advanced genomic sequencing techniques to identify intestinal changes as early as 18 months before the onset of the autoimmune disease.

They uncovered distinct alterations in several species of microorganisms and molecular components of cells and tissues in children who developed celiac disease. These alterations were not seen in at-risk children who did not develop the disease.

Results of the study, published in July in the Proceedings of the National Academy of Sciences, could lead to more effective celiac disease treatments and prevention.

“These results indicate that the microbiome can be a powerful indicator for celiac disease,” said Colwell, who is active at UMD in the Center for Bioinformatics and Computational Biology. “It provides an early warning before symptoms develop, allowing early intervention. With analysis of the microbiome via stool samples, infants can be monitored, and it may be possible for alteration of the diet to be sufficient to treat or prevent the disease.”

The research team tracked the gut microbiota of 500 at-risk children from birth through age 10 as part of the MassGeneral Hospital for Children’s Celiac Disease, Genomic, Microbiome and Metabolomic (CDGEMM) study. They began collecting extensive blood and fecal samples along with environmental data on participants in 2014.

Using metagenomic analysis, the researchers linked microbial composition with function and highlight changes associated with either increased inflammatory processes or reduced inflammation. An important part of the body’s immune response, inflammation is a significant cause of celiac disease symptoms.

For the current paper, the team compared the gut microbiome of 10 infants from the CDGEMM study who went on to develop celiac disease with the gut microbiome of 10 infants from the study who did not develop the autoimmune condition. All 20 children were genetically predisposed to develop celiac disease.

“We found significant changes in the intestinal microbes, pathways and metabolites 18 months before disease onset, which was confirmed with positive lab tests,” said Maureen Leonard, M.D., lead author of the study and clinical director of the Center for Celiac Research and Treatment. “This was much earlier than we expected.”

The changes researchers found included increases in pro-inflammatory microorganisms and decreases in protective and anti-inflammatory microorganisms at various time points before onset of the disease.

Colwell said the study demonstrates the power of next-generation sequencing coupled with bioinformatics to detect these important changes.

According to another of the paper’s authors—Alessio Fasano, M.D., director of the Center for Celiac Research and Treatment—the approach used in the study will help researchers develop similar studies for the diagnosis and treatment of a variety of conditions in which the microbiome could play a pathogenic role.

If confirmed by larger datasets, these findings may represent specific therapeutic targets for disease interception and possible prevention of celiac disease onset through microbiome manipulation during the preclinical phase.


This story was adapted from text provided by MassGeneral Hospital for Children.

CBCB Doctoral Student Studies Changes in Neurons that Affect Mammals’ Sleep Cycles

Aug 02, 2021

Theresa Alexander, a graduate student in the Center for Bioinformatics and Computational Biology (CBCB), loves all kinds of creatures and spends her free time with her dog Fibonacci and riding her horses Ella, Spotty and Fergus.

As a fourth-year doctoral student in biological sciences, Alexander’s love of animals extends down to investigating very specific biological problems that affect them.

She earned her master’s degree in epidemiology and biostatistics with a focus on genetic biostatistics from Case Western Reserve University in 2017. Currently, she is co-advised by Najib El-Sayed, a professor of cell biology and molecular genetics with a joint appointment in the University of Maryland Institute for Advanced Computer Studies (UMIACS), and Colenso Speer, an assistant professor of biology.

Her most recent work focuses on investigating a specific neuronal cell type that syncs mammals' sleep/wake cycle—known as circadian rhythm—to the sun cycle. These specific cells in the brain—called intrinsically photosensitive retinal ganglion cells (ipRGCs)—have a protein that makes them photosensitive, meaning these neurons send action potentials from your eye to a specific part of the brain when light photons enter your eye.

Alexander says neurons have to package and ship mRNAs far distances to then be turned into proteins in these distal parts of the cell. Messenger ribonucleic acid—or mRNA for short—is a molecule that carries genetic code from DNA to the rest of the cell.

“One problem we are trying to understand is what mRNAs actually get shipped out to these far-away compartments in these specific neurons—i.e. how does the USPS sorting center in the neuron cell body decide which mRNAs get packaged up and sent out to the processing centers in the ‘rural’ parts of the cell,” she explains.

The second major problem involves looking at the heterogeneity between different neuronal cell-types and sub-cell types. IpRGCs can be split into six different “sub-cell” types. Each of these cell types goes from the retina to different brain targets to perform different functions—some help us maintain our sleep-wake cycles in coordination with the sun cycle while others have image-forming visual functions.

“What we want to help shed light on is how these cells differ from other subtypes based on which genes each expresses throughout development at different critical neuronal growth stages,” Alexander says.

Her group is currently studying ipRGCs in mice, but says she hopes that these results extend to ipRGC function in other mammals.

Alexander enjoys working in CBCB because of the sense of community and the wide array of research being conducted.

“It is absolutely amazing to get a room full of people from so many backgrounds and specialties together,” she says. “There's always a unique perspective to solve a problem, and if I am working on a new problem or an area which I am not familiar with, there's always someone to go to who has some experience with it.”

El-Sayed says Alexander contributes a “wealth of interdisciplinary knowledge” to his lab as well as CBCB.

“Theresa is well-grounded both in the life sciences and bioinformatics and brings a deep understanding of statistical inference to our analyses,” he says. “In addition to her impressive intellectual capacity, she is pleasant, positive and simply fun to work with.”

Alexander is also a UMD COMBINE (Computation and Mathematics for Biological Networks) fellow, a National Science Foundation-funded Research Traineeship (NRT) program in network biology.

Additionally, she volunteers for "Girls Talk Math,” a summer program for high school students that is run by UMD’s math department. The program exposes students to math topics that they may not come across in their normal curriculum. Alexander teaches network science.

After she earns her doctorate, Alexander envisions applying her data science skills to biotech research.

“The intersection of biology, computer science and statistics is the most exciting place to me, and goal-oriented industry research seems like a perfect place for that,” she says.

—Story by Melissa Brachfeld

Colwell Concludes a Decade of Work with Gulf of Mexico Research Initiative

Jun 17, 2021

Rita Colwell, a Distinguished University Professor in the University of Maryland Institute for Advanced Computer Studies, recently concluded a decade of work with the Gulf of Mexico Research Initiative (GoMRI), an independent group of experts formed in 2010 following the Deepwater Horizon explosion and oil spill.

Colwell served as the research board chair for GoMRI, leading a group of almost two-dozen senior scientists and public policy experts who provided guidance on a large-scale effort to investigate the impacts of oil, dispersed oil, and oil dispersants on the ecosystems of the Gulf of Mexico and affected coastal states.

The research board oversaw an unprecedented investigation into the effects of the Deepwater disaster—a multidisciplinary undertaking that would ultimately involve more than 4,500 people, including scientists, lab techs, data and outreach specialists, students, and countless others.

The goal of the initiative, officials say, was to improve society’s ability to understand, respond to and mitigate the impacts of petroleum pollution and related stressors of the marine and coastal ecosystems, with an emphasis on conditions found in the Gulf of Mexico.

A detailed accounting on the history, scope—and successful outcomes—of this 10-year effort was recently published in a special edition of Oceanography magazine. Go here to view a PDF that highlights the efforts by Colwell and others associated with the GoMRI project.

Molloy Is Designing Efficient Algorithms for Reconstructing Evolutionary Trees

Jun 09, 2021

Perhaps the most iconic image in evolutionary biology is Charles Darwin's sketch of an evolutionary tree. The illustration highlights Darwin’s transformational idea that the evolutionary relationships among species can be depicted through a branching pattern, a concept known as the Tree of Life.

While Darwin primarily relied on the physical characteristics shared by subsets of species to determine an evolutionary tree’s structure, scientists today are using vast amounts of genomic data to reconstruct evolutionary trees—a field known as phylogenetics.

Erin Molloy, who joins the University of Maryland on July 1 as an assistant professor in the Department of Computer Science, is part of this new genomic revolution.

She is using powerful computational tools to unlock the full breadth of information available in genomic data, designing efficient algorithms to estimate evolutionary trees. This type of information is vital for determining the evolutionary history of birds, plants and even microbes, such as SARS-CoV-2, the virus which causes COVID-19.

“The main goal is to estimate the tree—and other parameters—given the observed genomic data,” says Molloy, who will also hold an appointment in the University of Maryland Institute for Advanced Computer Studies (UMIACS). “The resulting phylogeny is not only interesting in its own right, but it is also important for downstream analyses.”

For example, she notes that the phylogeny for SARS-CoV-2 is not only useful for studying how the virus evolves and how new strains emerge, but also for strain identification—that is, determining which strains are present in a sample—and even contact tracing.

Molloy recently completed a year-long postdoctoral researcher position in the Machine Learning and Genomics Lab at the University of California, Los Angeles.

She says she is looking forward to expanding her research agenda at Maryland, where she can take advantage of UMIACS’ vast computational resources.

A major line of Molloy’s research focuses on the development of phylogeny estimation methods that can effectively utilize distributed-memory systems.

In this context, she says, the genomic data set is distributed across multiple processors, and the algorithm may require these processors to communicate with each other.

“In the worst case, the processors must synchronize with each other at specific points in the algorithm, Molloy explains. “All of this dramatically slows down the computation. My goal is to design methods that reduce communication bottlenecks, while achieving the same accuracy and statistical guarantees of existing methods.”

Molloy says she looks forward to working with graduate students and faculty at Maryland, particularly within the Center for Bioinformatics and Computational Biology.

She has previously collaborated with Mihai Pop, the director of UMIACS, on a project that utilizes estimated phylogenies to perform taxon identification and abundance profiling from metagenomics data sets.

“I hope to continue working with Mihai on problems in metagenomics, where modeling evolutionary processes could prove advantageous,” Molloy says.

Other UMIACS faculty she expects to collaborate with include Brantley Hall, who works on identifying the functions of the genes in the microbiome, and Michael Cummings, who also approaches phylogeny estimation from a high-performance computing lens.

Although Cummings and Molloy utilize different methodologies for phylogeny estimation, Molloy says working together could lead to new approaches.

“There is a lot of potential for creating a hybrid method that combines aspects of our different approaches,” she says. “I look forward to discussing these ideas with my new colleagues at the University of Maryland.”

—Story by Melissa Brachfeld

CBCB Researchers Develop Tool that Makes Reconstructing Microbial Genomes Easier

Apr 30, 2021

In the seafaring world, a binnacle is a wooden stand placed near the ship’s helm that holds important tools and instruments needed to navigate from one point to the next.

At the University of Maryland, researchers in the Center for Bioinformatics and Computational Biology (CBCB) have developed their own tool—appropriately called Binnacle—that can help scientists navigate the complex world of microbial genomes.

This open-source software was described in a paper, “Binnacle: Using Scaffolds to Improve the Contiguity and Quality of Metagenomic Bins,” that was recently published in the online journal Frontiers in Microbiology.

The paper was written by Harihara Subrahmaniam Muralidharan (lead author), a third-year computer science doctoral student; Nidhi Shah (co-lead author) who just defended her doctoral dissertation; Jacquelyn Meisel, an assistant research scientist in the University of Maryland Institute for Advanced Computer Studies (UMIACS); and Mihai Pop, a professor of computer science and the director of UMIACS.

The CBCB team begins with the premise that recent advances in high-throughput sequencing strategies have spurred microbiome research and revealed important insights into the microbial communities that inhabit human, animal and environmental habitats.

In particular, whole metagenomic shotgun sequencing—which allows for a comprehensive analysis of microbial DNA from a small-sized sample—has been instrumental in expanding an understanding of the functional potential and genetic composition of different microorganisms that have not been previously cultured.

The challenge, though, is that reconstructing complete genomes of organisms from whole metagenomic shotgun sequencing data is often difficult and time-consuming.

Recovered genomes are often highly fragmented, the researchers say, due to uneven abundances of organisms, repeats within and across genomes, sequencing errors, and strain-level variation. To address the fragmented nature of metagenomic assemblies, scientists rely on a process called binning, which clusters together contigs—a set of overlapping DNA segments—inferred to originate from the same organism.

Existing binning algorithms use oligonucleotide frequencies and contig abundance (coverage) within and across samples to group together contigs from the same organism. However, these algorithms often miss short contigs and contigs from regions with unusual coverage or DNA composition characteristics.

The CBCB researchers propose that information from assembly graphs—used to represent the final assembly of a genome or metagenomes—can assist current strategies for metagenomic binning. They use a metagenomic scaffolding tool, called MetaCarvel, to construct assembly graphs where contigs are nodes and edges are inferred based on paired-end reads.

The Binnacle software is then able to extract information from the assembly graphs and subsequently cluster scaffolds into comprehensive bins.

The CBCB researchers show that binning graph-based scaffolds, rather than contigs, improves the contiguity and quality of the resulting bins, and captures a broader set of the genes of the organisms being reconstructed.

The authors believe that their Binnacle software represents a first step toward the development of effective metagenomic analysis tools that can leverage all the information contained in one or more samples. Ultimately, this could lead to the automated reconstruction of a metagenome-assembled genome, opening new pathways for accurate and efficient discoveries in public-health microbiology and other fields.

The research described in the published paper is supported by grants from the National Institutes of Health and the National Science Foundation.

—Story by Melissa Brachfeld

Talk Nerdy Podcast Features Rita Colwell Discussing Sexism and Science

Tue Mar 30, 2021

Rita Colwell, a Distinguished University Professor in the University of Maryland Institute for Advanced Computer Studies, recently discussed her storied career as an infectious disease scientist on the Talk Nerdy podcast hosted by Cara Santa Maria.

Colwell—still active in research, including multiple international efforts to stem the spread of cholera—talked candidly about the barriers she faced throughout her career as a woman scientist.

Go here to listen to the entire 60-minute podcast.

El-Sayed to Lead New Genomic Sequencing Facility at UMD

Mar 24, 2021

Najib El-Sayed, a professor of cell biology and molecular genetics with a joint appointment in the University of Maryland Institute for Advanced Computer Studies, has been named director of a new campus facility that will enable researchers to access single-cell DNA and RNA sequencing technologies.

The Advanced Genomic Technologies Core (BBI-AGTC), part of the Brain and Behavior Institute (BBI), will advance high-throughput and advanced large-scale genomics research, and will greatly improve the university’s capabilities in neuroscience and other life sciences, says Elizabeth Quinlan, a professor of biology and director of the BBI.

The BBI-AGTC is set to open in mid-April 2021 and will house one of the latest sequencing platforms (Illumina NextSeq 1000), a single-cell controller (10X Genomics Chromium), a liquid handling robotic station (Eppendorf epMotion 5075tc), a real-time PCR instrument and a microfluidics-based platform for the sizing, quantification and quality control of DNA and RNA.

“Next generation sequencing was a big advance toward understanding genomes and RNA expressed in different tissues, and the BBI-AGTC takes that technology a step further by enabling transcriptomes of individual cells,” says Karen Carleton, a professor of biology: “More broadly, the facility could be important for answering a diverse set of biological questions including how cells change through development, infection, or even in response to different behavioral states such as during mating or parental care.”

El-Sayed’s own research uses genomic approaches to study the biology of parasitism. His team develops and applies molecular, computational and systems biology tools to better understand host-pathogen interactions and, ultimately, the mechanisms of infection and survival. His work in functional genomics, tracing how biological information flows from gene to protein expression via RNA transcription, looks to contribute to better diagnosis, prevention of and therapeutics for parasite- and bacteria-caused diseases in humans, animals and plants.

El-Sayed has published more than 90 research papers and serves on the editorial boards of BMC Genomics and PLoS One. He also serves on the National Institute of Health’s Genomics, Computational Biology and Technology study section and the scientific advisory board for the National Institute of Allergy and Infectious Diseases’ Genomic Centers for Infectious Diseases.

Former CBCB Graduate Student Receives Larry S. Davis Doctoral Dissertation Award

Dec 15, 2020

A former graduate student in the Center for Bioinformatics and Computational Biology (CBCB) has been recognized for the excellence of her academic work on developing practical and efficient solutions to index large collections of genomes.

Fatemeh Almodaresi, who graduated with a Ph.D. in computer science in Summer 2020, is one of two students to receive the Larry S. Davis Doctoral Dissertation Award this year.

The annual award recognizes outstanding doctoral dissertations in the Department of Computer Science that convey excellence in their technical depth, significance, potential impact and presentation quality.

The award is named for Larry Davis, a Distinguished University Professor of computer science who served as chair of the department from 1999–2012. Davis was also the founding director of the University of Maryland Institute for Advanced Computer Studies (UMIACS), providing leadership for the institute from 1985–1994.

Her UMD dissertation, “Algorithms and Data Structures Indexing, Querying, and Analyzing Large Collections of Sequencing Data in the Presence or Absence of a Reference,” covers the development of new data structures as well as innovative, practical and efficient solutions to indexing large collections of genomes.

“Fatemeh demonstrates both technical brilliance and the ability to come up with ideas that are theoretically interesting and of tremendous practical impact,” says Rob Patro, an associate professor of computer science who advised Almodaresi during her doctoral studies.

In recommending her for the Davis dissertation award, Patro noted Almodaresi’s design of a new compacted version of a De Bruijn graph, a data structure that is used in bioinformatics to assemble and analyze genomes. Although similar structures have previously been proposed, Almodaresi’s work greatly improved the practicality and efficiency in important ways, Patro explains.

In recommending her for the dissertation award, Patro noted Almodaresi’s design of a new compacted version of a De Bruijn graph, a data structure that is used in bioinformatics to assemble and analyze genomes. Although similar structures have previously been proposed, Almodaresi’s work greatly improved the practicality and efficiency in important ways, Patro explains.

Almodaresi also built a tool using her new data structure for the taxonomic assignment of metagenomic read data, a well-studied problem in the field of computational biology. The taxonomic classifier she designed is both more accurate and more efficient—requiring less memory while operating at a similar speed—than two popular tools that are widely-used to accomplish the same task, Patro says.

One major goal in the field is to build practical indexes on the hundreds of thousands of sequencing samples available in public data repositories. In pursuit of this idea, Almodaresi developed new methods and solutions to perform large-scale search of raw sequencing data using a system called Mantis. Her methods allow the index to be iteratively updated–a crucial feature as the amount of genomic data to be indexed continues to grow.

“I anticipate that much of the theory and methodology Fatemeh develops will have a far-reaching impact both within and beyond the field of genomics,” says Patro. “She is a truly expectational researcher—displaying an intelligence, commitment and real passion at a level that I find to be rare, even among top Ph.D. students.”

Story by Maria Herd


Subscribe to All News