Doing great science depends on teamwork, whether this is within the lab or in collaboration with other labs. However, sometimes the resources that support our work can be overlooked. Our ‘Featured resource’ series aims to shine a light on these unsung heroes of the science world. In our latest article, we hear from Vitor Trovisco (Curator at FlyBase) and others in the team, who describe the work of FlyBase.
FlyBase (flybase.org) is the primary knowledgebase and hub for genomic, genetic and functional data on the fruit fly, Drosophila melanogaster. FlyBase was established in 1992, following funding from the National Center for Human Genome Research of the NIH, USA [Ashburner M, 1994; The FlyBase Consortium, 1994], as an online database for information on the fruit fly’s genes and mutations that had previously been collated in the Red Book [Lindsley and Zimm, 1992], and has since accompanied the constant advances in genomics and genetics. Nowadays, FlyBase hosts a comprehensive and ever-growing collection of data curated from large scale projects to primary research publications, which include gene models, expression patterns and function, alleles and transgenic constructs, phenotypes, genetic and physical interactions, disease models, gene groups, large datasets, fly stocks and other reagents. Additionally, FlyBase hosts many linkouts to external resources, particularly those from which it draws data (e.g. UniProt, NCBI, FlyAtlas/2) and several which provide reagents and advanced research tools for fly research (e.g., fly stock centres, DNA clones, Drosophila RNAi Screening Center). Find a comprehensive list of external resources here.
People behind FlyBase
FlyBase is an international consortium of biocurators and IT developers based at Harvard University (USA), Indiana University (USA), the University of New Mexico (USA) and the University of Cambridge (UK). Harvard hosts the IT developers in charge of the database infrastructure, and the team of curators responsible for genomic features, gene models, expression patterns, disease models and physical interactions. Indiana hosts the IT developers entrusted with the website and its query tools. Cambridge hosts the team of curators in charge of genetic entities, phenotypes and genetic interactions, functional data (GO), neuronal gene expression patterns (with VFB), single cell expression data, and ontologies. The team at New Mexico contributes to general curation and physical interactions curation. For the full team, see here.
FlyBase also enjoys great support from its external scientific advisory board, which includes Drosophila researchers and representatives of other genomic databases.
FlyBase is part of the Alliance of Genome Resources consortium (the Alliance), together with 5 other model organism genomics databases (Saccharomyces Genome Database, WormBase, Mouse Genome Database, the Zebrafish Information Network, Rat Genome Database) and the Gene Ontology Resource [Alliance of Genome Resources Consortium, 2022]. The Alliance aims to provide better comparative biology data and tools, by bringing together, harmonising and leveraging cross-species genetics and genomics data. As part of the Alliance, FlyBase contributes to and benefits from this improved integration to the advantage of the wider biomedical field.
Virtual Fly Brain
FlyBase is closely intertwined with Virtual Fly Brain (VFB), an interactive web-based tool for neurobiologists. VFB facilitates the study of detailed neuroanatomy, neuron connectivity and expression data of Drosophila melanogaster. VFB aims to make it easier for researchers to find relevant anatomical information and reagents. VFB is a UK-based collaboration between the University of Edinburgh, the University of Cambridge/FlyBase, the MRC Laboratory of Molecular Biology and the EMBL-EBI. FlyBase collaborates in the curation of anatomical entities and transgene expression patterns and provides the transgene expression curation displayed by VFB. In the near future VFB will also provide gene expression summaries derived from single cell data.
Single Cell Expression Atlas
The EMBL-EBI’s Single Cell Expression Atlas initiative re-analyses and standardises publicly-available single cell RNA sequencing studies to make them more comparable and easier to interpret. Through its browser, users can easily visualise clusters of cells, their annotations, and search for gene expression patterns. Our collaboration has expedited the curation of fly datasets and their integration into FlyBase, through dataset report pages and cell type scRNAseq expression summary ribbons on the gene report pages. This work is closely coordinated with Virtual Fly Brain.
Since inception, FlyBase has had the extraordinary financial support of the National Human Genome Research Institute at the U.S. National Institutes of Health (NHGRI/NIH, currently U41HG000739), in the form of pluri-annual grants that assure FlyBase’s core operations: continual curation of published literature, maintenance and improvement of both the database infrastructure and website. FlyBase has also benefited from grants from other sources to integrate specific new data types. Currently these come from the US’s National Science Foundation (DBI-2035515, 2039324), the UK’s Wellcome Trust (PLM13398) and the UK’s Biotechnology and Biological Sciences Research Council (BBSRC, BB/T014008). Additionally, the UK’s Medical Research Council has provided ongoing funding for gene function annotation since 1996 (currently MR/N030117/1). Despite its continual support, NHGRI/NIH has had to impose significant funding cuts in recent years, putting FlyBase and other model organism genomic databases under some financial strain [Bellen, 2021]. In the face of this and in order to continue providing a high standard of service, FlyBase has had to resort to crowd-funding from the Drosophila research community in the form of annual user fees. Researchers around the world have been extremely generous and their contributions have lessened the impact of the cuts.
Resource overview and highlights
Most data in FlyBase is organised into a series of report pages, corresponding to different data classes (e.g. gene, allele, aberration, dataset), each hosting different types of information. For example, the report page for a given gene displays its associated phenotypes, expression patterns, disease models, and functional data (GO) amongst other data. Each type of data is organised as annotation entries, frequently in table format.
Data are available at different scales to cater to all kinds of users, from the occasional user to the power user – see [Larkin, 2021; Gramates, 2022]. For the most frequent piecemeal use case, the ‘Quick search’ and ‘Jump-to-gene'(J2G) tools allow finding and navigating to individual report pages (see figure). For higher level data-mining there is an array of query tools to explore, such as Batch Download, QueryBuilder, CytoSearch and Feature Mapper (links under ‘Tools’ in the navigation bar). Power users can explore an array of APIs, download precomputed files with the full dataset of several classes of data, and even get hold of the whole database (links under ‘Downloads’ in the navigation bar). Below are a few recent additions.
Most FlyBase tools retrieve their results as Interactive HitLists, or can convert them into HitLists via an “Export to HitList” option, which allow users to view, analyse and export results (see figure). For example, results can be filtered by species or data type. Selecting a single data class allows conversion between associated data types (e.g. genes to alleles) and analysing results by type (e.g. aberrations by mutagen type). Processed results can then be exported as a downloaded file, as a new HitList, or to other tools.
‘Gene groups and pathways’ report pages
These recent additions to FlyBase present sets of related genes, connected by their membership to the same signalling pathway (Pathway reports) or macromolecular complex, or by sharing a common molecular function or biological role (Gene Groups)(see figure). The assembly of these gene sets is based on their underlying GO annotations, which were systematically reviewed from a wide range of sources to ensure accuracy and findability. Gene groups are hierarchical. For example, the “ENZYMES” gene group hosts the “OXIDOREDUCTASES”, “TRANSFERASES”, “HYDROLASES”, “LYASES”, “ISOMERASES”, “LIGASES” and “TRANSLOCASES” child groups, and each of these have their own child groups. Pathway members are organised into “core” members, “positive regulators”, “negative regulators” and “ligand production” members. Gene group and pathway report pages also display GO ribbon stacks, which allow for a quick visual comparison of the group members’ function (see figure).
‘Experimental tool’ data was introduced to help users find alleles and transgenes with particular characteristics. We define experimental tools as commonly used sequences with useful properties that are exploited to study the biological function of another gene product or a biological process. Examples of different types of experimental tool include those that enable a gene product to be detected (e.g. the FLAG tag, EGFP, mCherry), target a gene product somewhere specific within a cell (e.g. mitochondrial targeting sequence), drive expression in a binary system (e.g. UAS, GAL4) or are used to modify cellular activity (e.g. to inhibit/activate neurons). As new alleles and transgenes are added to the database, they are also linked to any relevant experimental tools, building up a picture of what they are made of. This allows users to easily browse and search for fly stocks with particular properties (e.g. all EGFP-tagged transgenes of their gene of interest).
FlyBase is rooted in the collaborative spirit of the Drosophila research community and good communication is crucial to continue providing a high standard of service. For that, FlyBase sends a couple of surveys a year to the FlyBase Community Advisory Group, which is made up of volunteer users at any career stage, from any biology field, and at any level of expertise on the database resources. Anyone can join by following the link under ‘Community’ in the navigation bar. The surveys try to gauge the level of usage and satisfaction of certain tools and what features could be added or eliminated, and are used to inform the focus of FlyBase resource development.
The query tools and data display are designed to be intuitive, supported by clear help pages. Video tutorials and ‘Tweetorials’ are available for many tools and resources, particularly if new, revamped or heavily used (see full list here).
For more direct interactions with the community, FlyBase tries to be present at major international conferences, such as the US Annual Drosophila Research Conference and the European Drosophila Research Conference. And FlyBase always welcomes suggestions, enquiries and corrections via our
Helpmail (link at the bottom of every page). These messages are read by everyone in the team, so that they can be addressed by the most suitable people.
Help from users
The fly research community has always been extremely supportive and can continue to do so at many levels. In addition to the financial support mentioned above, it is highly important and appreciated if users cite FlyBase whenever possible in articles, presentations and funding applications (citation link at the bottom of every webpage). These acknowledgements make FlyBase’s impact on research more tangible and specifically the article citations provide metrics that can be used for funding applications.
‘Gene snapshot’ summaries
FlyBase welcomes expert researchers to contribute ’Gene Snapshot’ summaries for their favourite genes. These provide a quick overview of the function of a gene’s product, based on key points solicited by FlyBase, and are reviewed by curators.
Help from authors
Authors can also contribute in several ways to simplify the curation of their articles, ultimately allowing their data to be more quickly available on the website.
When you write your paper…
Clear, detailed and accurate descriptions of the experiments and resources minimises the curation effort and reduces the need to contact the authors. Articles should mention official FlyBase identifiers and nomenclature for entities such as genes, alleles, stocks and anatomical structures and should specify the molecular details of newly created alleles.
Once your paper is published…
When a research or review paper is published, authors should get an email from FlyBase asking for their help by filling in the Fast-Track Your Paper (FTYP) form. It requests authors to add the genes their articles focus on, which will become ready to display the next release, and minimal information on the types of experiments performed, which triages and helps prioritise the article for further curation.
Occasionally FlyBase has to send emails with clarification requests. Replying to these queries is greatly appreciated, as it allows for a more complete and accurate capture of the published data and makes it more readily available for display.
Alliance of Genome Resources Consortium. Harmonizing model organism data in the Alliance of Genome Resources. Genetics. 2022 Apr 4;220(4):iyac022.
Ashburner M, Drysdale R. FlyBase–the Drosophila genetic database. Development. 1994 Jul;120(7):2077-9.
Bellen HJ, Hubbard EJA, Lehmann R, Madhani HD, Solnica-Krezel L, Southard-Smith EM. Model organism databases are in jeopardy. Development. 2021 Oct 1;148(19):dev200193.
Gramates LS, Agapite J, Attrill H, Calvi BR, Crosby MA, Dos Santos G, Goodman JL, Goutte-Gattat D, Jenkins VK, Kaufman T, Larkin A, Matthews BB, Millburn G, Strelets VB. FlyBase: a guided tour of highlighted features. Genetics. 2022 Apr 4;220(4):iyac035.
Larkin A, Marygold SJ, Antonazzo G, Attrill H, Dos Santos G, Garapati PV, Goodman JL, Gramates LS, Millburn G, Strelets VB, Tabone CJ, Thurmond J; FlyBase Consortium. FlyBase: updates to the Drosophila melanogaster knowledge base. Nucleic Acids Res. 2021 Jan 8;49(D1):D899-D907.
Lindsley, Zimm. The Genome of Drosophila melanogaster. Academic Press, 1992.
The FlyBase Consortium. FlyBase–the Drosophila database. Nucleic Acids Res. 1994 Sep;22(17):3456-8.