Computing the Tree of Life: Leveraging the Power of Desktop and Service Grids

TitleComputing the Tree of Life: Leveraging the Power of Desktop and Service Grids
Publication TypeConference Proceedings
Year of Conference2011
AuthorsBazinet AL, Cummings MP
Conference NameParallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on
Date Published2011
Keywords(artificial, (mathematics), analysis, BOINC, COMPUTATION, computational, computing, data, Estimation, evolutionary, GARLI, genetic, Grid, GRIDS, handling, heterogeneous, History, HPC, information, intelligence), interface, interfaces, Internet, jobs, lattice, learning, life, likelihood, load, machine, maximum, method, model, molecular, phylogenetic, portal, Portals, power, project, resource, Science, sequence, service, services, sets, software, substantial, system, systematics, tree, TREES, user, Web

The trend in life sciences research, particularly in molecular evolutionary systematics, is toward larger data sets and ever-more detailed evolutionary models, which can generate substantial computational loads. Over the past several years we have developed a grid computing system aimed at providing researchers the computational power needed to complete such analyses in a timely manner. Our grid system, known as The Lattice Project, was the first to combine two models of grid computing - the service model, which mainly federates large institutional HPC resources, and the desktop model, which harnesses the power of PCs volunteered by the general public. Recently we have developed a "science portal" style web interface that makes it easier than ever for phylogenetic analyses to be completed using GARLI, a popular program that uses a maximum likelihood method to infer the evolutionary history of organisms on the basis of genetic sequence data. This paper describes our approach to scheduling thousands of GARLI jobs with diverse requirements to heterogeneous grid resources, which include volunteer computers running BOINC software. A key component of this system provides a priori GARLI runtime estimates using machine learning with random forests.