Organizing with Ontologies

Posted by Steph Nowotarski, on 23 August 2021

We built an anatomy ontology. You should too – here’s why.

We have an information scale problem. I’m hardly the first to note the exponentiality and rapidity of information growth, as it was a keenly felt sentiment even at the dawn of the Industrial Age: “Knowledge begets knowledge as money bears interest (Conan Doyle, 1885).” Today there are a staggering 32,948,436 papers in the PubMed database (Aug 17th, 20:07 PM CT). Consider this sobering perspective: if a single person wanted to read each and every one of these papers, and, optimistically(!), read 5 papers per day, it would take 6,589,687 days. Or, 18,053 years. Which is the average lifetime of 229 people. And that’s just the papers written about all of the data that was collected.

The information problem in the life sciences in particular is further compounded by the varied types of data in current publications: supplemental figures, spreadsheets, stand-alone databases. With technological advances and increased storage capacity, collecting big data is no longer the bottleneck. New information is cheap. From single cell sequencing to large scale volumetric microscopy imaging, we have more data than we can wrap our heads around. How then do we effectively mine data to generate testable hypotheses with the potential to transmogrify information into new knowledge? Part of the answer lies in creating unified analysis schemes across platforms that are both human and machine readable. One of the ways we are doing this is by using ontologies as frameworks for organizing data.

What is an ontology?

Ontology might be a new word to you, but more than likely you’ve already been using them. Ontologies organize and link data for social media sites and big retailers. Have you ever saved an item to a Pinterest board (Gonçalves et al., 2019)? Used a filter to shop for a specific color, brand, and size of clothing from an online retailer? Run a GO (Gene Ontology) enrichment on a differentially expressed gene set? Used FlyBase or WormBase to browse gene pages? If so, you’ve interacted with an ontology. And you are going to interact with more.

An ontology by definition (Oxford Languages) is:

(1) the branch of metaphysics dealing with the nature of being (not this one!)

(2) a set of concepts and categories in a subject area or domain that shows their properties and the relations between them. (this one!)

If you’re familiar with libraries and the Dewey decimal system, this will all start to sound very familiar. To explain, let’s jump into an example:

“On the Origin of Species” IS A book.

That statement is an ontological axiom. An ontological axiom is a simple sentence that follows a pattern: concept / relationship / concept. In our example, both “On the Origin of Species” and “book” are concepts; IS A is the relationship. Now, let’s take it one step further: the idea of the concept in your head likely has some specific attributes. In ontological terms, those specifics are known as properties.

A set of properties for “On the Origin of Species” could be:
Author: Charles Darwin
Publication Date: November 24, 1859
ISBNs: 9780521867092, 9780857088475, 9788423918164…

Now we have a concept with properties and the categorical relationships between them. But we don’t have to stop there! We can define other relationships that exist for “On the Origin of Species” and string them together, like this: “On the Origin of Species” IS A scientific non-fiction book; a scientific non-fiction book IS A non-fiction book; and a non-fiction book IS A book. Here’s the super power of ontologies: by adding properties via relationships, we create a clear structure that can be used to run searches of either the properties (return all books where Author = Charles Darwin) or on the relationship (return all non-fiction books), and get resulting sets that include “On the Origin of Species.”

When concepts are visualized with their relations, ontologies are a web of information. Using common rules make ontologies interoperable. This interoperability allows information from different knowledge domains to be connected.

How do we use ontologies in biological sciences?

From how individual genes and what anatomical structures contribute to an organism, to a chemical library of compounds and molecules, to scientific evidence arising from laboratory experiments, ontologies are instrumental for data organization in the biological sciences (Chibucos et al., 2014; Degtyarenko et al., 2008; Haendel et al., 2009; The Gene Ontology Consortium, 2019). Arguably, the Gene Ontology (GO) (http://geneontology.org/), is the most familiar ontology in biology. GO describes how individual genes contribute to the biology of organisms at the organismal, cellular, and molecular levels. Another widely used ontology is the Uber anatomy ontology (Uberon, http://uberon.github.io/) (Haendel et al., 2009), a GO-integrated framework that describes body parts, organs, and tissues across animal species. Uberon unites anatomy ontologies for a growing variety of traditional and emerging research organisms, facilitating comparative evolutionary and developmental studies.

Why build an anatomy ontology?

Everything we study in biology comes down to a process that is happening in a place, in an organism. That single cell data? It came from stem cells sorted from the intestine of a mouse. That volumetric electron microscopy data? It came from mouse intestinal crypts. That in situ data that shows Lgr5 expression in mouse intestinal crypt stem cells… that crypt cell remodeling phenotype… all these disparate data, have the context of anatomy in common. Thus, anatomy is at the root of organizing seemingly disparate datasets and is a de facto way to aggregate data.

Does my research organism have an anatomy ontology?

If you work in an established research organism, great! You likely already have an anatomy ontology to hook your data up to. Check to see if your organism of choice has one at the Ontology Lookup Service. Almost any organism with a “base” (Flybase, WormBase…) already has an ontology and uses it to organize data within the base and as a framework for other tools, like Virtual Fly Brain (Osumi-Sutherland et al., 2014). If you work on an emerging research organism, and you are poised to generate a lot of data, there’s good news here, too. Many research communities are generating anatomy ontologies, notably Ciona and recently, Planarians (Hotta et al., 2020; Nowotarski et al., 2021).

What if my research organism doesn’t have an anatomy ontology?

If your research organism does not have an anatomy ontology, consider starting one! Assemble a squad with an expert(s) on the anatomy of your organism and at least one person who has some coding experience, and you can build an ontology for your data. The tools in the field are easy to use (Web Protégé, Git Hub and Google Sheets) and are becoming increasingly accessible with the ontology-development-kit (Matentzoglu, 2021), ROBOT (Jackson et al., 2019), and COGs (https://github.com/ontodev/cogs).

When’s the best time to build an anatomy ontology?

It is never too soon to put frameworks in place to organize and connect big data. For example, a growing number of labs use the planarian flatworm Schmidtea mediterranea as a research organism to model regeneration and stem cell biology, but there are still far fewer when compared to labs using Drosophila or C. elegans. Searching Pubmed for “Drosophila”, “C.elegans”, and “Planaria” yields 113,316; 35,030; and 1,884 papers, respectively. Going back to our original 5 paper a day example, it would take one person 62 years to read all of the Drosophila papers, 19 years to read all of the C. elegans papers, and just over a single year to read all of the Planaria papers. For the planarian field, this meant we were at a point where our data and information base was manageably small for a team of curators to capture all the anatomical terms needed for an ontology. As a general rule of thumb, it is a good time to build an ontology for data organization when the published record is still small enough for humans to read and process. That way, we ensure we can capture old data, as well as promote and ensure that future data can be integrated into a unified framework.

Why we need to use ontologies to organize big data:

If you’ll allow a somewhat geeky paraphrase, with big data, comes great responsibility. How do we handle big data responsibly? Efficiently? And in a way that is accessible and reusable? Luckily, we already have a framework in the form of FAIR. FAIR data practices insist that data be Findable, Accessible, Interoperable, and Reproducible (Wilkinson et al., 2016). When data is acquired and handled according to FAIR practices, everyone wins. Anatomy ontologies are Findable and Accessible when available through the Ontology Lookup Service (Jupp et al., 2015), are interoperable when using relationships found in the Relationship Ontology, and are Reproducible when reported in adherence to the Minimum Information for Reporting an Ontology (MIRO) practices (Matentzoglu et al., 2018). Adhering to FAIR practices while annotating anatomical data using an ontology ensures that all folks can access research and data more easily, source data has an opportunity to gather more citations, and importantly we all get more accessible science for our money.

Anatomy ontologies are the difference between hoarding data in piles versus curating and organizing biological data into a searchable library. Building an anatomy ontology for a research organism may seem like a big undertaking, but it is a necessary investment in the community, a tool everyone can benefit from. Consider our own experience: if two biologists and someone who scripts could build an anatomy ontology with help from the great community at the Open Biological and Biomedical Ontologies (OBO)foundry, so can you.

References

Chibucos, M. C., Mungall, C. J., Balakrishnan, R., Christie, K. R., Huntley, R. P., White, O., Blake, J. A., Lewis, S. E. and Giglio, M. (2014). Standardized description of scientific evidence using the Evidence Ontology (ECO). Database 2014,.

Conan Doyle, S. A. (1885). The Great Kleinplatz Experiment. The New York Times.

Degtyarenko, K., de Matos, P., Ennis, M., Hastings, J., Zbinden, M., McNaught, A., Alcántara, R., Darsow, M., Guedj, M. and Ashburner, M. (2008). ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Res. 36, D344–50.

Gonçalves, R. S., Horridge, M., Li, R., Liu, Y., Musen, M. A., Nyulas, C. I., Obamos, E., Shrouty, D. and Temple, D. (2019). Use of OWL and Semantic Web Technologies at Pinterest. arXiv [cs.CL].

Haendel, M., Gkoutos, G., Lewis, S. and Mungall, C. (2009). Uberon: towards a comprehensive multi-species anatomy ontology. Nature Precedings 1–1.

Hotta, K., Dauga, D. and Manni, L. (2020). The ontology of the anatomy and development of the solitary ascidian Ciona: the swimming larva and its metamorphosis. Sci. Rep. 10, 17916.

Jackson, R. C., Balhoff, J. P., Douglass, E., Harris, N. L., Mungall, C. J. and Overton, J. A. (2019). ROBOT: A Tool for Automating Ontology Workflows. BMC Bioinformatics 20, 407.

Jupp, S., Burdett, T., Leroy, C. and Parkinson, H. E. (2015). A new Ontology Lookup Service at EMBL-EBI. SWAT4LS 2, 118–119.

Matentzoglu, N. (2021). INCATools/ontology-development-kit: June 2020 release.

Matentzoglu, N., Malone, J., Mungall, C. and Stevens, R. (2018). MIRO: guidelines for minimum information for the reporting of an ontology. J. Biomed. Semantics 9, 6.

Nowotarski, S. H., Davies, E. L., Robb, S. M. C., Ross, E. J., Matentzoglu, N., Doddihal, V., Mir, M., McClain, M. and Sánchez Alvarado, A. (2021). Planarian Anatomy Ontology: a resource to connect data within and across experimental platforms. Development 148,.

Osumi-Sutherland, D., Costa, M., Court, R. and O’Kane, C. J. (2014). Virtual Fly Brain – Using OWL to support the mapping and genetic dissection of the Drosophila brain. CEUR Workshop Proc. 1265, 85–96.

The Gene Ontology Consortium (2019). The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res. 47, D330–D338.

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018.