Patro Awarded $400K to Enhance Open-Source Gene Expression Tools

Jun 27, 2024

The concept of gene expression analysis might compare to an orchestra’s conductor—a guiding force for researchers intent on deciphering the genomic melodies within our cells. From unraveling the mysteries of diseases to better understanding human development, this scientific method significantly advances biomedical research by shedding light on life’s inner workings.

To further research and scholarship that is focused on gene expression analysis, the Chan Zuckerberg Initiative (CZI) has awarded a $400,000 grant to a team of University of Maryland researchers. CZI is a philanthropic effort launched in 2015 by Meta founder Mark Zuckerberg and his wife, Priscilla Chan.

The award supports efforts by Rob Patro, an associate professor of computer science, in unifying and enhancing open-source transcriptomics tools—freely available software programs designed to analyze RNA transcripts, enabling researchers to efficiently study gene expression and regulation.

Patro, who has an appointment in the University of Maryland Institute for Advanced Computer Studies (UMIACS), says he is honored to receive this grant, saying it highlights the value of his lab’s commitment to furthering open science methodology and expanding access to open-source, practical and robust scientific software.

“This award will improve upon the core algorithms and data structures of software tools that have been used in thousands of gene expression analysis studies to date, increasing performance and reducing runtime and resource requirements and hence, costs,” says Patro, who is a core member of the Center for Bioinformatics and Computational Biology. “At the same time, it will build new bridges of interoperability between popular tools, accelerating existing pipelines and providing scientists with more choices in how best to analyze their data.”

One primary goal, Patro says, is to enhance the use and impact of a popular open-source tool known as the STAR aligner, which uses a suffix array—a data structure for efficiently handling large sequences of DNA, RNA or protein data—to align genomic sequencing data.

Despite the suffix array’s long history and optimization for efficient construction, it hasn’t fully utilized modern hardware capabilities, Patro says.

Working closely with Jamshed Khan, who just completed his fifth-year as a computer science doctoral student—and in collaboration with UMIACS colleagues Laxman Dhulipala and Erin Molloy as well as graduate student Tobias Rubel—the UMD team has developed a new parallel algorithm for suffix array construction that outperforms existing methods, building the index faster while simultaneously reducing memory usage. By bringing this novel algorithm and other enhancements to the STAR aligner, the team hopes to improve gene analysis tool’s performance.

Another focus is to integrate various transcriptome analysis tools. The UMD researchers will create a high-bandwidth connection between STAR and their own Salmon tool for transcript quantification from bulk RNA sequencing data, reducing runtime and resource usage. A similar integration will be built between STAR and their alevin-fry tool for single-cell gene expression analysis.

Lastly, along with Daniel Liu (a collaborator and recent graduate from UCLA) and Noah Cape (a student at Williams College whom Patro advised as part of the CBCB’s REU program), Patro is developing a universal adaptor tool for single-cell sequencing analysis, accommodating different data formats and protocols. This tool will enable existing single-cell expression quantification tools to process diverse types of data without requiring code modifications for each new input format.

Patro’s winning proposal was one of 32 that were funded this year through CZI’s Essential Open-Source Software for Science program, which supports open-source software projects that are essential to biomedical research. Its goal is to support software maintenance, growth, development and community engagement for these critical tools. This year’s program co-funders are the Kavli Foundation and the Wellcome Trust.

This marks the second time that Patro has received funding from CZI. In 2022, he was awarded $350K to improve upon a collection of interrelated tools his lab developed to process genomic data.

—Story by Melissa Brachfeld, UMIACS communications group