Researchers at the University of North Carolina at Chapel Hill will play key roles in a new project that applies semantic technologies developed by computer and information scientists to the field of evolutionary biology.
Unlike DNA and protein sequence information, which can be combined and compared from different organisms using universal computational methods, phenotype information has no such universal code or set of tools. Highly specialized expertise is needed to compare anatomical or physiological knowledge about different sets of organisms. The new project, funded by the National Science Foundation (NSF), aims to make this kind of expertise accessible to computers in a way that will open up new avenues for studying how phenotypes evolve along the tree of life.
Todd Vision, associate professor in the department of biology in the College of Arts & Sciences and principal investigator on the project, will address the problem of semantic ancestral character reconstruction, the process of using the phenotypes of present-day organisms to infer the phenotype of their evolutionary ancestors. He will also work on semantic enrichment, which is used by researchers to find hidden relationships among phenotypes that change along different branches of an evolutionary tree.
“We are approaching a time in which it will be possible to get a reasonable estimate of the evolutionary relationships among any set of organisms by stitching together results from multiple published studies, and it will be possible to list all the phenotypes that have been observed on the tips of that tree,” said Vision. “But for the vast majority of phenotypes, without the kinds of tools we aim to develop, biologists will be unable to say much about how the evolution of those phenotypes unfolded.”
Co-PI James Balhoff, a senior research scientist at RENCI who is an evolutionary biologist by training, will develop software and the database platform to be used by the project.
“Semantic technologies such as ontologies already play a major role in standardizing data descriptions in biomedical applications,” said Balhoff. “By making it possible to readily incorporate this shared knowledge into new analytic tools, we can make adoption of these standards even more powerful.”
The two principal investigators have been part of a long-standing collaboration called Phenoscape, which has pioneered the application of ontologies to biological phenotypes. Ontologies are data models that capture expert knowledge in such a way that computers can apply logical reasoning to that knowledge. They have proven to be particularly valuable for data integration applications. For instance, ontologies link databases describing the functions of genes in distantly related organisms such as yeast, fruit flies, and humans. The new project will leverage ontologies for more challenging reasoning tasks.
The research team includes collaborators at the University of South Dakota, Duke University, and Virginia Tech. UNC-Chapel Hill will receive more than $800,000 from the NSF for the three-year effort.
“We will focus on the evolution of miniaturization in fishes as a test case to show that these tools work”, said Vision, “but the tools we are developing should be of general use for studying phenotype evolution in any organism.”
“Looking more broadly, in many scientific fields, like biomedicine and environmental science, researchers rely on qualitative observations rather than quantitative ones,” said Balhoff. “There is a great opportunity for ontologies to enable new informatics capabilities wherever that is the case.”