A postdoctoral position (fully-funded for 4 years) is available in the laboratory of Dr. Rashmi Priya at the Francis Crick institute. Dr Priya’s laboratory focuses on the mechano-molecular control of organ development during embryogenesis. For a brief overview of the lab, please visit https://www.crick.ac.uk/research/labs/rashmi-priya or get in touch with Dr. Priya.
The Organ Morphodynamics lab is starting at the Francis Crick in January 2021 and will grow to six people over the next 2 years. We have generous core-funding support and access to state-of-the-art facilities and technology platforms including Advanced light microscopy, High throughput sequencing, Bioinformatics and Image analysis help desk. The Francis Crick is a modern, world class biomedical research institute in central London. The Francis Crick and the participating organizations (UCL, Imperial College London and King’s College London) offer a highly inclusive, collaborative and thriving research community with many career development opportunities.
I am especially looking for candidates who are interested in combining interdisciplinary approaches to gain a systemic understanding of organ morphogenesis using a well-suited model system – the developing zebrafish heart. The project will aim to unravel the underlying mechanical, molecular and geometric interactions that transforms a developing heart from a simple epithelium into a highly intricate patterned organ.
The suitable candidate will use advanced microscopic techniques, image analysis, genetic/optical manipulations, biophysical approaches and collaborate with theoreticians to understand how morphological and molecular complexity emerges during heart development. Candidates with a strong background in advanced confocal and/or light sheet imaging, image analysis, zebrafish genetics and a good understanding of the mechanics of tissue morphogenesis and/or heart development are encouraged to apply. The successful candidate should be keen in pursuing collaborative research, should have excellent communication skills and should be a good team player.
For further details about the project and how to apply, please visit the Crick vacancies portal or get in touch – rashmi.priya@crick.ac.uk.
The cytoskeletal filament network within our cells underpins the functionality of virtually all cellular processes. Apart from conferring a structural framework giving cells their unique shapes, the cytoskeleton also regulates a host of dynamic activities ranging from cell division to migration, transport, and polarization. Understanding how the cytoskeleton orchestrates these events with unique spatial and temporal specificity within a developing organism remains one of the most fascinating questions in the field.
During the earliest stages of mammalian life, the cytoskeleton guides the formation of the blastocyst – a cluster of 32- to 64-cells comprising a differentiated outer cell layer known as the trophectoderm that will give rise to placental tissues, and a pluripotent inner cell mass that later forms the foetus itself (White et al., 2018). The early mouse embryo contains all three major cytoskeletal filament classes: actin, microtubules, and intermediate filaments. Interestingly, while many studies have investigated the roles of actin and microtubule filaments in regulating early embryo development, the function of the intermediate filament network during this time has remained entirely unknown. Yet unlike their more well-studied counterparts, intermediate filaments encompass a diverse range of proteins including keratin, vimentin, and desmin that are expressed in unique tissue-specific patterns, and can self-assemble into filaments in the absence of cofactors or nucleators.
We initially approached this question by consulting the literature: in 1980, the first papers were published identifying keratins as the first and only cytoplasmic intermediate filaments expressed in the early mouse embryo (Jackson et al.,1980; Paulin et al., 1980). Although there are over 50 keratin subtypes, the predominant ones in the early embryo are K8 and K18, the same subtypes that are characteristic of simple epithelia in mature tissues. A number of studies subsequently investigated keratin expression patterns during these early developmental stages, establishing their restricted localization in trophectoderm cells of the blastocyst and complete absence within the inner cell mass (Chisholm and Houliston, 1987; Duprey et al., 1985; Oshima et al., 1983). Yet their expression prior to blastocyst formation was never firmly established, owing to conflicting findings and differing methodologies. Combined with the fact that keratin knockout embryos survived preimplantation development (Baribault et al., 1993, 1994; Magin et al., 1998) and that the few early studies perturbing keratin functions reported no significant embryo phenotypes (Emerson, 1988), interest in keratin filaments during early embryo development gradually waned around the turn of the millennium.
Relooking at keratin filaments in the early mouse embryo almost three decades later offered us surprising insights. Although keratins are most well-known for their structural role in hair, skin, and nails, more recent studies have found that keratins within epithelial tissues also have diverse non-structural roles, including cell polarization, apoptosis, and cell cycle regulation (Kirfel et al., 2003; Pan et al., 2013). Armed with this knowledge and the foundation laid by earlier studies, we thus explored whether keratins in the early embryo – with their unique expression pattern in the outer epithelial layer (trophectoderm) of the blastocyst – could play specific structural or non-structural roles during embryo development like other epithelial keratins.
To investigate keratin functions, we established a combination of immunofluorescence and live-embryo imaging techniques that enabled us to explore keratin patterns with high spatial resolution and evaluate their dynamic changes during development. Apart from these technical improvements offered by newer microscopy tools, we also went beyond the early keratin studies by establishing knockdown and overexpression methods to manipulate keratin filaments within the living embryo, providing a valuable model for assessing keratin functions.
In our paper, we report some of the first functions for keratin filaments in the early mouse embryo. We find that keratin filaments act as asymmetrically inherited fate determinants that specify the first trophectoderm cells of the early embryo. Unlike actin and microtubule filaments that dramatically reorganize during cell division, keratins are stably retained within the apical region of the mitotic cell, during the first divisions that segregate cells into inner and outer positions at the 8- to 16-cell stage (Fig. 1). This apical retention of keratins biases their asymmetric inheritance by the outer forming daughter cell (Fig. 2). Apical keratin localization is further mediated by the F-actin-rich apical domain,without which keratin filaments become homogenously distributed throughout the cell, and no longer segregate unequally between the forming daughter cells. This underscores the importance of keratin-actin interactions in guiding keratin filament dynamics and functions during embryonic development.
Fig. 1. Keratin filaments (labelled by K8 immunofluorescence) are stably retained in both interphase and mitotic cells of the embryo. In contrast, both the apical enrichment of actin (labelled by Phalloidin-Rhodamine) and the microtubule network (labelled by alpha-tubulin immunofluorescence) throughout the cytoplasm are lost when cells enter mitosis.
Fig. 2. Live embryo imaging reveals that keratin filaments (labelled by K18-Emerald) are asymmetrically inherited by outer daughter cells, during the cell divisions segregating inner and outer cells of the embryo.
How do keratins go on to function as fate determinants? Following their asymmetric inheritance by outer cells of the embryo, we find that keratins promote apical polarization and levels of downstream members of the Hippo pathway including Amot and nuclear Yap. This in turn drives the expression of Cdx2, one of the key transcription factors specifying trophectoderm fate in the early embryo. Conversely, outer cells that did not inherit keratin filaments or those with keratin knockdown fail to establish these trophectoderm features, instead displaying levels of Cdx2 comparable to inner cells of the embryo.
At later stages, in line with the established role of keratins in conferring structural support to epithelial tissues, the dense keratin network in the trophectoderm is also important for supporting blastocyst morphogenesis. Keratin knockdown reveals that without this filamentous network, embryos display defective apical and junctional morphologies suggestive of weakened tension, as well as reduced cellular stiffness. Thus, keratins in the embryo regulate both morphogenesis and fate specification to promote blastocyst formation and the specification of the first cell lineages in development.
Finally, our study also led us to uncover a surprising pattern of keratin expression during preimplantation development: Keratins assemble a dense filament network extending throughout all cells of the blastocyst trophectoderm, but instead display a salt-and-pepper pattern during earlier stages (Fig. 3). In both the mouse and human embryo, the first filaments form in a subset of cells of the 8- to 16-cell embryo, and the proportion of keratin-assembling cells increases over time. Importantly, the heterogenous keratin expression stands in stark contrast to actin filaments and microtubules, which both do not differ significantly in expression from cell to cell. This initial heterogenous expression of keratins at the 8-cell stage can be further attributed to cell-cell differences in the levels of the BAF chromatin remodelling complex within the 4-cell embryo, with manipulations of BAF levels sufficient to trigger changes in keratin expression patterns.
Fig. 3. Keratin filaments are heterogeneously expressed in the early embryo, beginning first in a subset of cells of the 8- to 16-cell mouse and human embryo. By the blastocyst stage, all trophectoderm cells are covered with a dense keratin filament network, but inner cells remain devoid of filaments.
Together, these findings connect cellular heterogeneities within the early embryo to fate specification pathways at later stages via the regulation of keratin expression. Although keratins have long been utilized as markers of the trophectoderm, our work further identifies keratins as regulators of trophectoderm fate, elucidating one of the first functions for these filaments during early development. With keratins once again placed in the spotlight and more experimental tools at our disposal, our understanding of keratins in the early mammalian embryo is set to expand in the years to come.
Baribault, H., Price, J., Miyai, K., and Oshima, R.G. (1993). Mid-gestational lethality in mice lacking keratin 8. Genes & Development 7, 1191–1202.
Baribault, H., Penner, J., Iozzo, R.V., and Wilson-Heiner, M. (1994). Colorectal hyperplasia and inflammation in keratin 8-deficient FVB/N mice. Genes & Development 8, 2964–2973.
Chisholm, J.C., and Houliston, E. (1987). Cytokeratin filament assembly in the preimplantation mouse embryo. Development 101, 565–582.
Duprey, P., Morello, D., Vasseur, M., Babinet, C., Condamine, H., Brulet, P., and Jacob, F. (1985). Expression of the cytokeratin endo A gene during early mouse embryogenesis. Proceedings of the National Academy of Sciences of the United States of America 82, 8535–8539.
Emerson, J.A. (1988). Disruption of the cytokeratin filament network in the preimplantation mouse embryo. Development 104, 219–234.
Jackson, B.W., Grund, C., Schmid, E., Bürki, K., Franke, W.W., and Illmensee, K. (1980). Formation of Cytoskeletal Elements During Mouse Embryogenesis: Intermediate Filaments of the Cytokeratin Type and Desmosomes in Preimplantation Embryos. Differentiation 17, 161–179.
Kirfel, J., Magin, T.M., and REICHELT, J. (2003). Keratins: a structural scaffold with emerging functions. Cellular and Molecular Life Sciences (CMLS) 60, 56–71.
Magin, T.M., Schröder, R., Leitgeb, S., Wanninger, F., Zatloukal, K., Grund, C., and Melton, D.W. (1998). Lessons from Keratin 18 Knockout Mice: Formation of Novel Keratin Filaments, Secondary Loss of Keratin 7 and Accumulation of Liver-specific Keratin 8-Positive Aggregates. J Cell Biol 140, 1441–1451.
Oshima, R.G., Howe, W.E., Klier, G., Adamson, E.D., and Shevinsky, L.H. (1983). Intermediate Filament Protein Synthesis in Preimplantation Murine Embryos. Developmental Biology 99, 447– 455.
Pan, X., Hobbs, R.P., and Coulombe, P.A. (2013). The expanding significance of keratin intermediate filaments in normal and diseased epithelia. Current Opinion in Cell Biology 25, 47–56.
Paulin, D., Babinet, C., Weber, K., and Osborn, M. (1980). Antibodies as probes of cellular differentiation and cytoskeletal organization in the mouse blastocyst. Experimental Cell Research 130, 297–304.
White, M.D., Zenker, J., Bissiere, S., and Plachta, N. (2018). Instructions for Assembling the Early Mammalian Embryo. Developmental Cell 45, 667–679.
Following the initial discovery of the homeobox in the 1980s in invertebrates and then vertebrates, it became quickly clear that homeobox genes come in two flavors – that of the Antennapedia-like HOX cluster genes and that of the many more non-clustered genes with diverse sequence and expression features (Gehring, 1998). One theme that became evident through expression and mutant analysis in a variety of organisms was the selective expression and function of homeobox genes within the nervous system (Gehring, 1998).
When I started to look for postdoctoral positions in the early 1990s, I was particularly intrigued by mutant phenotypes of several fly and worm homeobox genes (Blochlinger et al., 1988; Doe et al., 1988; Finney and Ruvkun, 1990; Way and Chalfie, 1988), but also by the work of the late Tom Jessell, who proposed a LIM homeobox code in the vertebrate spinal cord (Tsuchida et al., 1994). The simplicity and well-characterized nature of the C. elegans nervous system, as well as its genetic amenability was very appealing to me and, in 1996, I decided to join Gary Ruvkun’s lab. Gary’s lab had not only characterized one of the first C. elegans homeobox genes, unc-86 (Finney and Ruvkun, 1990; Finney et al., 1988); Thomas Bügrlin in Gary’s lab had also used library screening with degenerate probes, a method that now, in the post-genome era, seems quite archaic, to discover the abundance of homeobox genes in this simple organism (Burglin et al., 1989).
In Gary’s lab, I set out to study the expression and function of the LIM homeobox subfamily, which were discovered initially by Marty Chalfie (Way and Chalfie, 1988) and implicated further in neuronal identity specification by Tom Jessell’s lab (Tsuchida et al., 1994). Using emerging GFP reporter technology (Chalfie et al., 1994) and mutant analysis, I determined what turned out to be mostly incomplete expression patterns (owing to the shortcomings of “classic” reporter genes which often just contained fractions of their surrounding gene regulatory regions) and mutant phenotypes that could only be very superficially analyzed (owing to a shortage of markers that allowed for a more in-depth analysis of mutant phenotypes)(Hobert et al., 1998; Hobert et al., 1997; Hobert et al., 1999).
After starting my own lab at Columbia University in 1999, a string of students and postdocs (Zeynep Altun, Adam Wenick, Ephraim Tsalik, Feifan Zhang, Pat Gordon, Vincent Bertrand, Maria Doitsidou, Nuria Flames, Rich Poole, Paschalis Kratsios, Marie Gendrel, Esther Serrano-Saiz, Laura Pereira, among others) continued to work on a small number of specific homeobox genes, digging much deeper into what these genes did in the nervous system. One theme that continued to emerge throughout this analysis was that not only the classic unc-86 and mec-3 genes, studied in impressive depth by Marty Chalfie over the years (Chalfie, 1995), but other homeobox genes as well had a remarkably broad effect on the differentiation of specific neuron types. Rather than regulating only some subset of specific identity features in a neuron, several homeobox genes fulfilled a “master regulatory” role in controlling most, if not all, known identity features of a neuron, through direct initiation and maintenance of terminal differentiation gene batteries. This led me to propose the concept of “terminal selectors” of neuronal identity, a term extended from the Drosophila field where “selector genes” were coined as genes that act earlier in development to specify the identity of developing fields and tissues (Hobert, 2016).
This trajectory finally led to the work of Molly Reilly, a graduate student in my lab, who recently set out to achieve the ambitious goal of describing the expression patterns of the entire homeobox gene family across the entire C. elegans nervous system (Reilly et al., 2020). This tremendous leap forward was, as so often is the case, enabled by novel technology. First, gene expression patterns, or even better, protein expression patterns, can now be much more reliably identified by not just extracting some arbitrary small regulatory region adjacent to your gene of interests to drive a reporter gene. Rather, bacterial recombineering technology enables the reporter tagging of genes in the context of very large genomic intervals containing many genes up- and downstream of the gene of interest (Tursun et al., 2009). Moreover, CRISPR/Cas9 technology even allowed for reporter tagging of an entire locus in the endogenous context (Dickinson et al., 2013). But even with good reagents at hand, identifying sites of expression of a reporter gene across the entire nervous system has traditionally not been a small feat because neurons in C. elegans are tightly packed and their position can be locally variable. Here is where Eviatar Yemini, a postdoc in my lab, came in to solve the long-standing problem of neuronal cell identification. Using multiple distinct fluorophores (excluding GFP), Eviatar built a multicolor landmark strain, NeuroPAL, which unlike Brainbow-style technology, assigned neurons a strictly deterministic color code (Yemini et al., 2019). Crossing NeuroPAL with a GFP reporter strain enables unambiguous identification for the sites of gene expression, anywhere in the nervous system (Figure 1).
Figure 1: Examples of homeobox reporter gene expression patterns. The NeuroPAL transgene (left panel) was crosses to these reporters to unambiguously identify sites of homeobox gene expressions. Images courtesy of Molly Reilly and Ev Yemini.
Molly exploited these technological advances to (a) tag all but one of the 102 homeobox genes of C. elegans with a fluorescent reporter and (b) identify their sites of expression throughout the entire nervous system. What she found was something I could barely have dreamed of when starting my postdoc in Gary’s lab: Most of the conserved homeobox genes are not only sparsely expressed throughout the nervous system of the worm, but each of the 118 different neuron classes displayed a unique combination of homeobox genes (Figure 2).
Figure 2: Homeobox codes. Shown are all the homeobox gene expression patterns that contribute to neuron class specific expression. Homeobox genes are on top, neuron classes on the left. Reproduced from Reilly et al., 2020.
Homeobox genes are thus a comprehensive “descriptor” of neuronal diversity throughout an entire nervous system – a homeobox code for all neurons! Furthermore, the mapping of these homeobox genes led another graduate student, Cyril Cros, to find that neurons previously not known to express or require a homeobox gene, do indeed also require a homeobox gene for their identity specification (Reilly et al., 2020).
This is not the end of the road. The lab remains motivated to test whether indeed every single C. elegans neurons not only expresses, but requires a homeobox gene for their identity specification. Moreover, it remains little explored to what extent we can reprogram the identity of neurons by respecifying their homeobox codes. I am looking forward to see whether work in other systems with more complex brains will also uncover the broad employment of homeobox codes. Recent transcriptome analysis in restricted parts of the flies and mice CNS indeed provides tantalizing hints for similar specificity and selectivity of homeobox gene expression in more complex nervous systems (Allen et al., 2020; Davis et al., 2020; Sugino et al., 2019).
Mechanism of cell polarisation and first lineage segregation in the human embryo
Meng Zhu, Marta N. Shahbazi, Angel Martin, Chuanxin Zhang, Berna Sozen, Mate Borsos, Rachel S. Mandelbaum, Richard J. Paulson, Matteo A. Mole, Marga Esbert, Richard T. Scott, Alison Campbell, Simon Fishel, Viviana Gradinaru, Han Zhao, Keliang Wu, Zijiang Chen, Emre Seli, Maria J. de los Santos, Magdalena Zernicka-Goetz
Hedgehog signaling activates a heterochronic gene regulatory network to control differentiation timing across lineages
Megan Rowton, Carlos Perez-Cervantes, Ariel Rydeen, Suzy Hur, Jessica Jacobs-Li, Nikita Deng, Emery Lu, Alexander Guzzetta, Jeffrey D. Steimle, Andrew Hoffmann, Sonja Lazarevic, Xinan Holly Yang, Chul Kim, Shuhan Yu, Heather Eckart, Sabrina Iddir, Mervenaz Koska, Erika Hanson, Sunny Sun-Kin Chan, Daniel J. Garry, Michael Kyba, Anindita Basu, Kohta Ikegami, Sebastian Pott, Ivan P. Moskowitz
Proneural genes define ground state rules to regulate neurogenic patterning and cortical folding
Sisu Han, Grey A Wilkinson, Satoshi Okawa, Lata Adnani, Rajiv Dixit, Imrul Faisal, Matthew Brooks, Veronique Cortay, Vorapin Chinchalongporn, Dawn Zinyk, Saiqun Li, Jinghua Gao, Faizan Malik, Yacine Touahri, Vladimir Espinosa Angarica, Ana-Maria Oproescu, Eko Raharjo, Yaroslav Ilnytskyy, Jung-Woong Kim, Wei Wu, Waleed Rahmani, Igor Kovalchuk, Jennifer Ai-wen Chan, Deborah Kurrasch, Diogo S. Castro, Colette Dehay, Anand Swaroop, Jeff Biernaskie, Antonio del Sol, Carol Schuurmans
Fine-tuning of Epithelial EGFR signals Supports Coordinated Mammary Gland Development
Alexandr Samocha, Hanna M. Doh, Vaishnavi Sitarama, Quy H. Nguyen, Oghenekevwe Gbenedio, Joshua D. Rudolf, Walter L. Eckalbar, Andrea J. Barczak, Yi Miao, K. Christopher Garcia, Devon Lawson, Zena Werb, Kai Kessenbrock, Philippe Depeille, Jeroen P. Roose
Tissue topography steers migrating Drosophila border cells
Wei Dai, Xiaoran Guo, Yuansheng Cao, James A. Mondo, Joseph P. Campanale, Brandon J. Montell, Haley Burrous, Sebastian Streichan, Nir Gov, Wouter Jan Rappel, Denise J. Montell
A Human Multi-Lineage Hepatic Organoid Model for Liver Fibrosis
Yuan Guan, Annika Enejder, Meiyue Wang, Zhuoqing Fang, Lu Cui, Shih-Yu Chen, Jingxiao Wang, Yalun Tan, Manhong Wu, Xinyu Chen, Patrik K. Johansson, Issra Osman, Koshi Kunimoto, Pierre Russo, Sarah C. Heilshorn, Gary Peltz
Secondary ossification center induces and protects growth plate structure
Meng Xie, Pavel Gol’din, Anna Nele Herdina, Jordi Estefa, Ekaterina V Medvedeva, Lei Li, Phillip T Newton, Svetlana Kotova, Boris Shavkuta, Aditya Saxena, Lauren T Shumate, Brian Metscher, Karl Großschmidt, Shigeki Nishimori, Anastasia Akovantseva, Anna P Usanova, Anastasiia D Kurenkova, Anoop Kumar, Irene Linares Arregui, Paul Tafforeau, Kaj Fried, Mattias Carlström, Andras Simon, Christian Gasser, Henry M Kronenberg, Murat Bastepe, Kimberly L. Cooper, Peter Timashev, Sophie Sanchez, Igor Adameyko, Anders Eriksson, Andrei S Chagin
Visualizing the metazoan proliferation-terminal differentiation decision in vivo
Rebecca C. Adikes, Abraham Q. Kohrman, Michael A. Q. Martinez, Nicholas J. Palmisano, Jayson J. Smith, Taylor N. Medwig-Kinney, Mingwei Min, Maria D. Sallee, Ononnah B. Ahmed, Nuri Kim, Simeiyun Liu, Robert D. Morabito, Nicholas Weeks, Qinyun Zhao, Wan Zhang, Jessica L. Feldman, Michalis Barkoulas, Ariel M. Pani, Sabrina L. Spencer, Benjamin L. Martin, David Q. Matus
Single-cell analysis of chromatin silencing programs in developmental and tumor progression
Steven J. Wu, View ORCID ProfileScott N. Furlan, Anca B. Mihalas, Hatice Kaya-Okur, View ORCID ProfileAbdullah H. Feroze, Samuel N. Emerson, View ORCID ProfileYe Zheng, Kalee Carson, Patrick J. Cimino, C. Dirk Keene, View ORCID ProfileEric C. Holland, View ORCID ProfileJay F. Sarthy, View ORCID ProfileRaphael Gottardo, View ORCID ProfileKami Ahmad, View ORCID ProfileSteven Henikoff, View ORCID ProfileAnoop P. Patel
Automated cell tracking using StarDist and TrackMate
Elnaz Fazeli, Nathan H. Roy, Gautier Follain, Romain F. Laine, Lucas von Chamier, Pekka E. Hänninen, John E. Eriksson, Jean-Yves Tinevez, Guillaume Jacquemet
Research practice & education
Preprinting the COVID-19 pandemic
Nicholas Fraser, Liam Brierley, Gautam Dey, Jessica K Polka, Máté Pálfy, Federico Nanni, Jonathon Alexis Coates
Measuring effects of trainee professional development on research productivity: A cross-institutional meta-analysis
Patrick D. Brandt, Susi Sturzenegger Varvayanis, Tracey Baas, Amanda F. Bolgioni, Janet Alder, Kimberly A. Petrie, Isabel Dominguez, Abigail M. Brown, C. Abigail Stayart, Harinder Singh, Audra Van Wart, Christine S. Chow, Ambika Mathur, Barbara M. Schreiber, David A. Fruman, Brent Bowden, Chris E. Holmquist, Daniel Arneman, Joshua D. Hall, Linda E. Hyman, Kathleen L. Gould, Roger Chalkley, Patrick J. Brennwald, Rebekah L. Layton
Optic cup development involves a series of intricate cell and tissue movements, and cells’ interaction with the extracellular matrix (ECM) is known to play an important role. However, the details of how ECM components work in eye development, and where they come from, is still poorly understood, and is the subject of a new Development paper that takes advantage of live imaging in zebrafish embryos. We caught up with first author Chase Bryan and his supervisor Kristen Kwan, Assistant Professor in the Department of Human Genetics at the University of Utah, Salt Lake City, to find out more about the story.
Chase (L) and Kristen (R)
Kristen, can you give us your scientific biography and the questions your lab is trying to answer?
KK Thanks for asking! I am a cell and developmental biologist. I got my start as a biochemist studying membrane trafficking as an undergraduate in Suzanne Pfeffer’s lab at Stanford University. I worked with Marc Kirschner during my PhD, and it was during that time that I began thinking about morphogenesis; I worked toward understanding how developmental signals are integrated with the cytoskeleton and cell adhesion during Xenopus development. Marc gave me a lot of freedom to develop these ideas, and since then I’ve been fascinated by the problem. Knowing that I wanted to do live imaging, I went on to do a postdoc with Chi-Bin Chien, where I began working on eye morphogenesis in zebrafish. Chi-Bin was extremely supportive and helped me start to develop computational approaches to address this problem. My lab is currently working to understand the cellular and molecular mechanisms governing eye morphogenesis. When, where and how do cells move? How is the tissue organized? How does the embryo construct three-dimensional organs in a precise and stereotyped manner? We hope to answer questions like these by combining live imaging, computational methods, genetics and cell biology.
Chase, how did you come to work with Kristen and what drives your research today?
CB I met Kristen briefly prior to starting graduate school at the University of Utah – I was working as a lab technician at the time, and the postdoc I was working with asked if I wanted to go see his friend’s (Kristen’s) job interview seminar. When I saw the movies of zebrafish optic cup development she made during her postdoc and heard the pitch she had for her science (something like ‘you can understand so much of biology simply by watching it happen’), I knew I wanted to work with her and have her teach me those techniques. That same idea drives the work I am doing now as a postdoc.
What did we know about the ECM’s role in optic cup morphogenesis before your work?
CB & KK We know surprisingly little about the function of ECM molecules! Research over many decades has described the presence of ECM proteins around the developing optic cup in many different organisms, but much less has been discovered about ECM function. Work from other groups has demonstrated that the ECM protein fibronectin is important to establish the lens, as is laminin-1. Different subunits of laminin-1 have also been shown to regulate optic cup shape, cell polarity and retinal differentiation, but beyond these proteins not much is known about the functional role of other ECM components.
Can you give us the key results of the paper in a paragraph?
CB & KK Mesenchymal cells have been observed surrounding the developing eye for decades, and studies in mouse had demonstrated that mutants with disruptions to the periocular mesenchyme display optic cup morphology defects, but a specific role of either the mesodermal mesenchyme or the migratory neural crest have not been well established. In this research, we focused on the neural crest, as we had genetic tools ready at hand to try and parse out the role of those cells in optic cup morphogenesis. We found that neural crest mutants in zebrafish displayed optic cup defects, and observed that neural crest cells migrate around the developing eye throughout optic cup morphogenesis. We then found that neural crest helps establish the basement membrane that surrounds the retinal pigment epithelium. These neural crest cells express the ECM protein nidogen, and by disrupting nidogen function we found that this neural crest-derived nidogen is necessary for proper optic cup morphogenesis.
Why do you think optic vesicle cells move faster when the basement membrane is disrupted?
CB & KK We propose in our model that the basement membrane serves as a molecular handbrake for the movement of optic vesicle cells. The optic vesicle develops as a bilaminar epithelial tissue, so movement in one part of the epithelium could push or pull other parts of the tissue forward or backward, like a conveyor belt. The basement membrane could serve as an adhesive layer for the epithelia to stick to, and could thereby regulate the speed at which individual cells or the sheet as a whole get moved along. Without the basement membrane in place to adhere to, those cells could lose their footing, so to speak, and keep getting pushed or pulled along faster than they do in wild-type conditions.
Collage of pseudocoloured micrographs of 24 h post-fertilization zebrafish optic cups surrounded by periocular neural crest cells.
And do you think all of the defects you observe result from problems with cell movements, or might other mechanical or signalling aspects be affected?
CB & KK Other mechanical or signalling aspects could certainly be affected when neural crest or basement membrane assembly get disrupted. The ECM has many known roles in regulating movement and presentation of signalling molecules, only a couple of which we have directly tested. In terms of mechanical aspects, integrin signalling is one obvious molecular link between the ECM and the cells that adhere to it, which could affect the cytoskeleton and in turn regulate epithelial morphogenesis. There are also a lot of unexplored mechanical and biophysical aspects of morphogenesis, and the ECM may play into many other pathways, such as Hippo or tension receptors.
When doing the research, did you have any particular result or eureka moment that has stuck with you?
CB I’m lucky because this project was full of satisfying little events. Some of the most memorable moments of my graduate work came from this project: taking a time-lapse image where the optic cup stayed in focus throughout the night and getting to watch the neural crest migrate around the eye, or looking at beautiful electron micrographs after spending an entire weekend prepping samples for electron microscopy. Perhaps the biggest moment like this came after I’d spent months engineering and raising the nidogen transgenics. To engineer the transgenes, inject the fish, raise multiple generations, and finally get to perform the heat-shock experiments and find that the transgenes worked just like I’d predicted was immensely gratifying.
“Some of the most memorable moments of my graduate work came from this project”
And what about the flipside: any moments of frustration or despair?
CB There was definitely some frustration working with a double mutant, and even more so when we started adding in transgenes. Even with zebrafish where you can have hundreds of embryos to work with at any given time, you can only work with a handful before they develop beyond the time point you’re trying to study. We had many experiments where Mendelian genetics worked against us and we didn’t get more than one or two embryos with the correct genotype.
So what next for you after this paper – I hear you’ve moved cross country?
CB That’s right, I recently started a postdoc with Marty Cohn at the University of Florida. I’m still keenly interested in the cell biology underlying morphogenesis – I’ve been developing a mouse organ culture system and am using the live-imaging techniques I learned from Kristen to study the formation of the mammalian urethra and to understand how normal development is altered in a condition called hypospadias.
Where will this work take the Kwan lab?
KK I am truly excited about where this work has led us: Chase’s work indicates that building the basement membrane is a collaborative endeavour between different tissues. Moving forward, we are extremely interested in identifying other ECM molecules provided by extraocular sources. Our preliminary data suggest that there are multiple cell types providing many different ECM molecules to the developing eye. We are excited to determine how these function in an integrated manner to support proper eye development.
Finally, let’s move outside the lab – what do you like to do in your spare time?
CB I enjoy figuring out how things work, and that expands beyond biology – before I was a cell biologist, I spent my free time rebuilding cars and motorcycles. I’m still an avid motorcyclist, but since moving to Florida I’ve also been taking advantage of the sun to get more into bicycling and triathlon training.
The Southwest Zebrafish Meeting 2020 (SWZM20) took place as an online-only meeting organized by the Scholpp lab at the Living Systems Institute, University of Exeter UK, on September 11th 2020. This meeting brought together 90 scientists, mostly from Southwest of the UK, who work with the zebrafish Danio rerio and was supported by a Scientific Meeting grant from the Company of Biologists and from the companies Tecniplast and DanioLab. We provide the readers who might have missed this event with a glimpse of the recent scientific advancements and experimental approaches applied by research groups using fish as a model to understand molecular and cellular mechanisms in development, to explore its role as an in vivo reporter in environmental sciences, and to elucidate common principles in organ regeneration.
To kick-off the SWZM20, Phil Ingham (NTU Singapore) introduced the International Zebrafish Society (IZFS). Then, the first session started with a lecture from Isaac Bianco (University College London), on the usage of zebrafish to understand how neural circuits control complex behaviour in zebrafish. One of their most complex visually guided behaviours is hunting, which begins from only 5 days post-fertilisation. Isaac and his team showed how the larval brain processes visual inputs to identify prey. The zebrafish visualizes specific features of the object, which are extracted and help to identify potential prey leading to the initiation of a hunting routine. During prey tracking, the zebrafish larvae coordinate a directed swim behaviour including turns towards the target. His team could show that that recent experience modulates the core output. His talk was followed by a further talk on fish behaviour from Min-Kyeung Choi (Ryu lab, LSI, Exeter) on the consequences of early life stress on social behaviour. These talks were followed by a session on signalling. Georgina McDonald (Hammond lab, University of Bristol) reported about the function of SMAD9 signalling in developing bones, Chengting Zhang (Scholpp lab, LSI Exeter) and Rachel Moore (Clarke lab, King’s College London) highlighted the importance of cell protrusions in signal transport and signal reception. Finally, Robert Kelsh (Department of Biology and Biochemistry, University of Bath) elucidated on the formation of the stripe pattern in zebrafish by a lattice-based mathematical model helping him to identify the crucial interactions between the different pigment cells in establishing of the distinguished pattern of zebrafish.
The SWZM20 used an online virtual bulletin board, where delegates could listen to talks, discuss poster presentations, engage with sponsors, collaborate and meet with colleagues, and enjoy their snacks in the Fish Café.
The behaviour/cell session was followed by the poster session. PhD students and postdocs from our zebrafish community presented and discussed their work in break-out rooms. After the lunch break, we discussed zebrafish as a model for ecotoxicology. Rebecca Boreham (Tyler lab, University of Exeter) and Sophie Cook (Lloyd-Evans lab, University of Cardiff) presented their data on the usage of transgenic zebrafish larvae as in vivo sensors of chemical-induced stress and the toxicity of iron oxide nanoparticles. This session was closed by a lecture from Charles Tyler (School of Biosciences, University of Exeter). In his work, transgenic zebrafish lines are developed that are sensitive to endocrine disruptors such as oestrogen-like chemicals. The transgenic zebrafish lines allow his team to identify where different chemicals interact in the body and in real-time. This is a very powerful tool to investigate the potential for wider health impacts of exposure to environmental oestrogens. Members of his team also work on other chemical contaminants of environmental and human health concern including pesticides and pharmaceuticals and how the zebrafish can help us to detect them and learn more about accumulation in specific organs. This was followed by a talk from Gregory Paull (Aquatic Resource Facility Manager, University of Exeter) on the important balance of husbandry with scientific research.
Delegates that participated in the Southwest Zebrafish Meeting 2020 (SWZM20) organized by the Living Systems Institute, University of Exeter UK, 11th Sep 2020.
In the final session of the day, the importance of zebrafish research in elucidating general principles in organ regeneration was discussed. This session started with four short talks from Daniel Wehner (MPI Erlangen, Germany) on axon regeneration; Paco Lopez-Cuevas (Martin lab, University of Bristol) on reprogramming macrophages and neutrophils; Noemie Hamilton (University of Sheffield) on microglial function and its contribution to the pathology of a childhood white matter disorder; and Rebecca Ryan (Richardson lab, University of Bristol) studying the role of Osteopontin in zebrafish cardiac regeneration. The session was closed by a lecture from Catherina Becker (Centre of Discovery Brain Sciences, University of Edinburgh) on investigating the cellular mechanism underlying regeneration of the zebrafish spinal cord. In her talk, she was focussing on the active spinal cord progenitor cells around the lesion and elucidated lesion-induced neurogenesis.
To end this wonderful day, the best posters and talks received prizes according to a qualified majority voting. Two delegates won prizes for their fantastic talks. Georgina McDonald, from the University of Bristol, gave an impressive talk that showed us how SMAD9 is regulated and expressed in zebrafish skeletal elements. Rachel Moore, from KCL, explained how actin-based protrusions lead microtubules during axon initiation in spinal neurons in vivo. A further two delegates won prizes for their informative posters. Aaron Scott, from the University of Bristol, displayed a colourful story on in vivo characterisation of endogenous and cardiovascular extracellular vesicles in zebrafish. Lastly, Yosuke Ono, from the University of Exeter, created an enlightening poster on post-embryonic development and growth of slow-twitch muscle fibres in zebrafish. All talks and posters presented were deserving of awards, but these were particularly exceptional from a scientific and exhibitive perspective.
The powerful imaging techniques, the emerging tools to study complex vertebrate behaviour, the possibility to study in vivo cell biology within a living organism, and the usage of zebrafish in analysing the effect of chemicals on an organism are only a few of the many reasons why zebrafish is and will remain an excellent research model in the future. And another reason to work with zebrafish became obvious during this day: Working with zebrafish makes you part of a fantastic, noble and lively community. The SWZM20 was an outstanding celebration of this wonderful fellowship. The SWZM20 with delegates from diverse backgrounds further underscored the need for close interaction in the zebrafish research field. Despite some local collaborations in the Southwest of the UK, a stronger network still has to be developed between the four main scientific centres of Exeter, Bristol, Bath and Cardiff. We thereby hope that the SWZM20 could serve as a springboard for even more future interactions from basic biology to translational research.
And this is also a reason why we are looking forward to meeting again next year for the SWZM21 in Bath!
The superplot was recently proposed as a data visualization strategy that improves the communication of experimental results (Lord et al, 2020). To simplify the visualization of data with a superplot, I created a web tool that is named SuperPlotsOfData (Goedhart, 2020). The superplot tutorial for R , the tutorial for Python and the SuperPlotsOfData web app use the tidy data format as input. The tidy data structure is different from the popular spreadsheet format and (in my experience) not intuitive to grasp. Therefore, the conversion of ordinary, spreadsheet data into this structure may present a considerable bottleneck for creating superplots. Here, I explain how to convert data from a spreadsheet (using R) into a tidy format to enable plotting with python, R or SuperPlotsOfData.
In a previous blog the conversion of a simple spreadsheet into the tidy format was explained, covering the basics. Here, I deal with a more complex spreadsheet that holds data of multiple replicates and two experimental conditions. The data is available here as an excel sheet or CSV file. A screenshot of the excel file is shown below. The first row lists the experimental condition and the second row identifies biological replicates. Each column holds the results from a measurement of speed. Since the measured data are distributed over multiple columns, this data is ‘wide’ and not tidy.
Tidying this data means that we have to move all the measured data into a single column. As we will see in the end result, other columns will specify the condition and replicate number.
In R/Rstudio we load two packages that are needed to read excel files and to reformat the data:
> library(tidyverse)
> library(readxl)
To load the data from the excel file and assign it to the dataframe ‘data_spread’ we use:
Note that in the command above, none of the rows is used to assign column names by specifying col_names = FALSE. The reason is that duplicate column names (as present in this file) are not accepted and will be modified. The column names will be assigned later on.
To obtain a vector with the labels for the conditions, the first row is selected, the type is changed to character and converted to a vector:
Next, the labels in the column ‘Condition’ that identify the treatment and the replicate are seperated and assigned to individual columns that are named accordingly:
The result is a tidy dataframe, in which all measurements of speed are located in a single column. In addition, we have columns that specify the treatment and the replicate number for each of the measurements:
# A tibble: 6 x 3
Treatment ReplicateSpeed<chr> <chr><chr>
1 Control Replicate1 43.692019999999999
2 Control Replicate1 41.856639999999999
3 Control Replicate1 49.117069999999998
4 Control Replicate1 49.793309999999998
5 Control Replicate1 41.543010000000002
6 Control Replicate1 44.042009999999998
The dataframe can be saved as a CSV file:
> write.csv(data_tidy, 'tidy-Grouped-data.csv')
The saved CSV file can be used as input for creating superplots in python, R or with the SuperPlotsOfData app. In fact, the resulting file is exactly the same as the one that is used to generate the plots in supplemental figure S4 and S5 of the original superplots paper.
An R-script (Tidy-replicates.R) that combines all the steps is available here.
Final words
The spreadsheet data that I used here as an example has a specific structure that is designed to make a dedicated data visualization. Nevertheless, the strategy for conversion should be applicable for similar datasets that have two rows that define conditions and replicates. Although the format of the spreadsheet may not be similar to your own data structure, I still hope that this tutorial gets you started with the conversion of spreadsheet data into tidy data.
The primary goal of this tutorial was to prepare the data for creating a superplot. But it also serves as an example of converting spreadsheet data into tidy data. I think it is valuable to have more examples that deal with conversion of experimental data into the tidy format, since it is a powerful concept in data handling. To get a flavour of what can be done with the tidy data format in the context of data visualization, you can check out the ggPlotteR web tool (https://huygens.science.uva.nl/ggPlotteR/). This tool uses tidy data as input (and contains example data) to quickly generate and tweak a wealth of different visualizations.
Faculty of Health and Medical Sciences
University of Copenhagen
We are seeking a highly motivated and self-starting Cell Culture Specialist.
About us
The Novo Nordisk Foundation Center for Stem Cell Biology – DanStem has been established as a result of a series of international recruitments coupled with internationally recognized research groups focused on insulin producing beta cells and cancer research already located at the University of Copenhagen. DanStem addresses basic research questions in stem cell and developmental biology and has activities focused on the translation of promising basic research results into new strategies and targets for the development of new therapies for cancer and chronic diseases such as diabetes and liver failure. Find more information about the Center at https://danstem.ku.dk/.
Job description
The Cell Culture Specialist will manage the Stem Cell Culture Platform (https://danstem.ku.dk/platforms/stem-cell-culture/) at DanStem, a self-contained shared-resource facility dedicated for the maintenance of and experimentation with mouse and human stem cells. Specific responsibilities include:
Work closely with DanStem scientists, platform specialists, laboratory manager and management to maintain and develop the cell culture laboratory facility and services
Introduce, maintain and enforce clear operating guidelines and safety in GMO1 and GMO2 cell culture labs
Establish and coordinate efforts for in-house production of important cytokines and necessary QC strategies to ensure their efficacy and batch quality.
Support users in design of modern cell modification technologies (e.g. CRISPR or similar technologies) and support larger scale activities of individual research groups.
Inform on and support stem cell culture activities for facility users, e.g., introduction to basic cell culture techniques, sterile working, troubleshooting, safety considerations.
Aliquoting and preparing reagents, testing for mycoplasma, karyotyping, maintaining equipment, monitoring and refilling stocks, defrosting freezers, and resolving conflicts between users
Order materials, equipment and furniture, negotiate and follow-up with vendors and service providers, and handle invoices
Establish new service agreements, annual service of equipment, and equipment upgrades
Organize and maintain cell cryostorage and backup storage system
Evaluate operations and coordinate upgrades and repairs and follow-up, including communication with the building operations department and assessment of proposed repairs
Respond to alarms, providing assistance by phone or in person
Substitute for the DanStem Laboratory Manager in case of holidays or illness
Qualifications
Candidates are expected to have at least a Master’s degree in natural or health sciences. A PhD degree is advantageous, but a proven record of similar job profile is equally qualifying. In addition, we are seeking a candidate who has experience in embryonic stem cell and/or iPS cell culture, genetic manipulations and differentiations, and
Has a scientific background in developmental or stem cell biology, experience with embryonic stem cells and/or organoid culture as well as working in GMO1/2 laboratories
Has demonstrated success working in research service facilities; managerial experience in this or another type of organization is a strong advantage
May have had exposure to assay technologies like fluorescence microscopy, flow cytometry or potentially histo-chemistry
Has experience in project management and a strong ability to prioritize and handle multiple tasks and frequent deadlines
Is proactive, innovative, analytical, goal- and solution-oriented while enjoying new challenges
Develops good relations in a multi-cultural research environment, and excellent oral and written communication skills in English
Works successfully with persons from a variety of organizations and professional levels and in tight collaboration with technology platforms at DanStem and CPR
For further information about the position, please contact Head of Laboratory and Platforms Malte Paulsen, malte.paulsen@sund.ku.dk
Employment conditions
The position as Cell Culture Specialist (senior consultant) is a full-time (37 hours), permanent position. The employment can begin any time from November 2020 or upon agreement with the chosen candidate. The place of work is at DanStem, University of Copenhagen, Blegdamsvej 3B, Copenhagen.
The position, at the University of Copenhagen, will be in accordance with the provisions of the collective agreement between the Danish Government and AC (the Danish Confederation of Professional Associations). To the salary is added a monthly contribution to a pension fund according to the collective agreement, and a supplement can be negotiated, depending on the candidate’s experiences and qualifications.
We offer a stimulating, multifaceted and international environment of high scientific and societal impact; the possibility for continued education and training; collaborative and creative colleagues in the research teams and scientific service platforms; and the opportunity to work with other departments and centers at the University and greater community.
Application
Send your application electronically by clicking “Apply online” below.
The application must include the following documents/attachments:
Motivated letter of application (max 1 page)
Curriculum vitae incl. education, experience, previous employments, language skills and other relevant skills
Certified copy of diplomas/degree certificate(s)
Letter of recommendation and/or contact details of referees
Application deadline: 25 October 2020
The University of Copenhagen wishes to reflect the diversity of society and welcomes applications from all qualified candidates regardless of personal background.
Part of the International Alliance of Research Universities (IARU), and among Europe’s top-ranking universities, the University of Copenhagen promotes research and teaching of the highest international standard. Rich in tradition and modern in outlook, the University gives students and staff the opportunity to cultivate their talent in an ambitious and informal environment. An effective organisation – with good working conditions and a collaborative work culture – creates the ideal framework for a successful academic career.
Info
Application deadline: 25-10-2020
Employment start: 01-12-2020
Working hours: Full time
Department/Location: The Novo Nordisk Foundation Center for Stem Cell Biology – DanStem
The Friedrich Miescher Institute for Biomedical Research (FMI) invites applications for a tenure-track group leader position (assistant professor level). We encourage applications from candidates who have an innovative research program on questions related to mechanisms of development, regeneration and disease. Applications from individuals using quantitative and interdisciplinary approaches are particularly welcome.
The position includes competitive start-up and core funding, as well as access to outstanding in-house core facilities for genomics, proteomics, structure biology, cell sorting, microscopy and image analysis, C. elegans, Hydra and mouse genetics, and computational biology. The Institute also provides a career mentoring program and assistance with research grant applications. Our vibrant, English-speaking scientific environment offers opportunities for multidisciplinary collaboration with FMI research teams in epigenetics, quantitative biology, tissue biology and neurobiology. For further information visit www.fmi.ch.
The FMI is a leading biomedical research institute affiliated with the University of Basel and the Novartis Institutes for Biomedical Research. It hosts around 190 postdoctoral fellows and graduate students recruited worldwide, and has a widely diverse international staff representing over 40 nationalities. We strongly believe that the best science comes out of a work culture that respects and appreciates individual differences and supports everyone to achieve their full potential. For the work-life balance of our employees, we offer flexible working hours and a teleworking program. We provide a supportive and family-friendly environment in which to develop ambitious and original research.
We offer financial support for relocation and settling in. The FMI is located in Basel, the third largest city in Switzerland, in the heart of Europe. Basel is home to a diverse international population and offers a rich cultural life and several multilingual schools.
The FMI is committed to raising the proportion of women in science; we thus strongly encourage female researchers to apply. The Institute is responsive to the needs of dual career couples.
Applications, including a CV, a concise description of research interests and future plans, and contact details for three referees should be submitted at:
A wave of innovations is advancing data-driven computational analysis and machine learning – time for developmental biologists to hop on the surf board! This post, inspired by our recent data-driven work on lateral line morphogenesis, provides a brief primer on key concepts and terms.
written by Jonas Hartmann & Darren Gilmour
From machine translation to self-driving cars and from deep fakes to high-speed trading, data-driven technologies are taking the world by storm, fueled by a torrent of funding from the tech industry. This has led to the rise of data science, a new and loosely defined interdisciplinary field that combines aspects of computer science, statistics and machine learning to extract information or generate predictions from large and otherwise inscrutable datasets.
Meanwhile in biology, it has never been easier to generate large and inscrutable datasets. New omics techniques and new microscopes paired with increased automation provide the means to digitize every aspect of a biological system, from single molecules to whole embryos. Crunching this data deluge and converting it into scientific understanding is simultaneously one of the biggest opportunities and one of the biggest challenges in modern biology – and data science is poised to become a powerful tool in meeting this challenge.
But not all data are created equal. Certain formats, such as counts or sequences, lend themselves more readily to the application of data science techniques. Others, including multi-dimensional images, require several layered steps of processing and analysis to tease apart the rich information encoded within them. Furthermore, much of the recent progress in machine learning is founded on big data, specifically on datasets with thousands or tens of thousands of samples, a scale that is hard if not impossible to achieve for many biological techniques. In short, datasets can be thought to fall onto a spectrum from big data (low information/sample ratio) to rich data (high information/sample ratio) – and when it comes to data-driven inference, big data has taken center stage among data scientists over the past years.
This brings us to developmental biology and more specifically to one of its key data sources: in vivo microscopy. Since high throughput remains a rare luxury in most imaging settings, developmental microscopists very much operate in the realm of rich data. This presents a major challenge for anyone looking to employ data science to disentangle the complexity of embryogenesis. Nevertheless, exciting progress is under way! For instance, many microscopists will be aware of the solutions machine learning is delivering for image analysis problems such as segmentation and tracking [1,2]. But as we learned in our work on the morphogenesis of the zebrafish lateral line primordium [3], segmentation is only the first step on the journey from image to insight (figure 1) – so our focus in this post is on what comes after a decent segmentation has been achieved.
For those not familiar with the lateral line primordium: it is a developmental tissue of a hundred or so cells that migrate as a cohesive group along the flank of the developing embryo, assembling and occasionally depositing rosette-shaped clusters of cells, which go on to form the lateral line sensory organs [4]. The need for tight coordination between collective cell migration and rosette morphogenesis make the lateral line primordium an ideal system to study the interplay between multiple developmental processes, especially since the transparency of zebrafish embryos greatly facilitates high-quality live imaging. With this context in mind, let’s jump into the data science.
Figure 1: An overview of our strategy for analyzing cellular architecture in the lateral line primordium. Note that single-cell segmentation is only the first step of our analysis. See the paper [3] for further details.
From Images to Numbers
Images, especially those produced by 3-5D microscopy of live embryos, are a prototypical example of rich data, as the obtainable number of samples is limited but each sample contains a wealth of information across multiple scales. By giving us both the (relative) abundance and the location of labeled components, images indirectly encode object counts, object shapes, contact interfaces between objects, textures, gradients, colocalization, and much more – and all of this potentially as dynamical measurements over time! Unfortunately, this information is not directly accessible to most modern data science methods, so exploring it in a data-driven fashion is difficult.
The key problem to solve is thus to go from images that encode all this rich information at once to numbers that encode the same information in a more explicit, disentangled fashion. In data science, such descriptive sets of numbers typically take the form of a feature space, which is a 2D array with a row for each sample and a column for each measure, also called a feature. The “space” in “feature space” reflects the notion that each sample’s feature values can be understood as a vector in an -dimensional space, where is the number of features. For microscopy data, features can be extracted to describe an entire image, regions of interest (e.g. different areas of a tissue) or fully segmented objects of interest (e.g. cells, nuclei, vesicles, filaments, etc.).
Strategies for feature extraction generally fall into one of two groups: feature engineering is the manual implementation of various measurements that could be of interest (e.g. volume, surface area, aspect ratios, etc.), whereas feature embedding is the use of an automated computational approach to generate features. The latter is also referred to as latent feature extraction, as implementations often look for variations across the entire dataset and then allocate features that capture this variation, thus uncovering “hidden” (latent) features that underlie the variability in the data. The advantage of feature embedding is that it is unbiased (or, more accurately; less biased), so it can capture interesting phenomena independent of the researcher’s prior assumptions. The disadvantage is that latent features may not correspond to obvious biological properties, so they can be harder to interpret.
In our recent paper, we use point clouds and clustering on those point clouds to bring 3D images of segmented cells into a format that allows us to use a simple Principal Component Analysis (PCA) to derive latent features. We use this approach primarily to analyze cell morphology based on the distribution of a membrane marker in 3D space, but it also works for other markers.
Over the past few years, feature engineering has become less common in data science, as deep neural networks can work directly with raw data; they essentially learn their own internal feature embedding strategy. This can also be used explicitly for latent feature extraction via neural network architectures called autoencoders (see this blog post for a short intro). The downside of this approach is that it usually requires big data to work reliably, so classical feature engineering and feature embedding strategies remain important tools for data-driven analysis of rich data.
A key limitation of the feature space data structure is that it doesn’t readily encode different organizational scales. For instance, it is not straightforward to meaningfully combine data on the tissue scale (a tissue x features array) with data at the cellular scale (cells x features) or below (e.g. protrusions x features) in a single analysis. Since exploring data across multiple scales is central to many biological questions, this is an area where technological innovation on the data science side is required to better accommodate biological applications.
From Numbers to Ideas
So what’s the point of all this work to convert raw image data into a tidy feature space? In short, it’s the wealth of computational tools that are available to operate on and visualize feature spaces to discover interesting patterns that lead to new hypotheses.
A common first step is dimensionality reduction, which essentially amounts to combining features that encode the same information and removing features that mostly contain noise (although this is a very pragmatic way of thinking about it). This is most commonly achieved with PCA. The point of dimensionality reduction is twofold: First, the remaining features are likely of high importance; to understand them is to understand the majority of variation in the entire dataset. Second, many of the tools discussed below struggle with very high-dimensional feature spaces, a problem known as the curse of dimensionality. This reflects our earlier discussion on big data versus rich data; most methods need many more samples than features to work well.
In our study, we performed dimensionality reduction and discovered that one of the most important remaining features relates to cell sphericity. We then mapped this key feature back onto the spatial organization of the lateral line primordium – averaged across all our samples – and found that it shows an unexpected central stripe pattern along the tissue. Given the intrinsic connection between sphericity and surface tension, this led us to the hypothesis that cell surface mechanics might regulate the relative location of cells within the migrating primordium.
Dimensionality reduction and related tools such as clustering (i.e. the identification of distinct subgroups among samples) are so-called unsupervised methods, as they work with a single input dataset. By contrast, supervised methods use both an input feature space and a target feature (also called the ground truth) and aim to learn the relationship between them. These algorithms are trained using a training set, for which both the input features and the target are known, for instance based on manual annotation of a subset of the data. Once trained, they can predict the target feature for new samples, based only on their input features.
A classifier is a supervised algorithm that can be trained to predict which of two or more groups a cell belongs to, given its features. We used this approach to classify cells into specific morphological archetypes, chosen by us to reflect our prior understanding of the tissue’s biology. This helped us make sense of the data from a biological perspective (more on this below).
A regressor predicts a continuous measure rather than a binary or categorical group label. We employed regressors to learn the relationship between shape features and features extracted from other channels, such as the distribution of vesicles or mRNA expression levels (measured by smFISH). Interestingly, regressors can be used as a means of integrating multiple different experiments (e.g. different smFISH stainings) based on reference features common to all experiments (e.g. cell morphology). By training a regressor to predict the experiment-specific results from the reference features, those specific results can be overlaid onto any dataset for which the same references are available. This enables the creation of an integrated dataset, often referred to as an atlas. Such atlases can be built across feature spaces but also directly across images, using neural networks as image-to-image regressors. An impressive example of which is given by the Allen Integrated Cell [5].
Although these data-driven techniques are powerful, unsolved challenges remain. One such challenge is the translation between how humans understand data and how computers understand data. For instance, it is difficult to include expert knowledge into a deep learning algorithm. Conversely, it can be hard to interpret the results produced by an algorithm in a biologically meaningful way. We used our classification of cells into different morphological archetypes to visualize and analyze our entire dataset from a more interpretable perspective, an approach we call context-guided visualization (figure 2).
Figure 2: Manual annotations (A) of contextual knowledge, in this case whether a cell is a leader cell or a peripheral, central or inter-organ follower cell, can be projected onto the entire dataset (B) through a machine learning classifier. This is the basis for context-guided visualization (C), which allows the data to be viewed and analyzed through the lens of prior biological knowledge.
The Future of Data-Driven Developmental Biology
It’s important to stress that data science techniques, although interesting and useful in a multitude of ways, cannot and should not replace “traditional” science. To complete the full cycle of science, the hypotheses extracted from the data need to be tested with specifically designed experiments, which is something we are currently pursuing for the observations we made in our data-driven analysis of the lateral line primordium.
Looking toward the future, we see considerable potential for data science to accelerate developmental biology. Besides solving challenging image analysis problems (such as single-cell segmentation), data-driven methods can serve as “hypothesis generators” that allow us to comb through the complexity and messiness of cell and tissue biology. Eventually, large databases could be constructed that interlink different types of information into “digital tissues” (figure 3) or even “digital embryos”, which can be mined for interesting patterns and relationships. Early examples of such large-scale atlases are already starting to come online [6].
But data-driven developmental biology has a long way to go. As we have pointed out, biology is uniquely complicated in its dynamic and multi-scale nature, which plays to some of the weaknesses of current data-driven tools. Thus, there is a need for technical work to build biology-specific data science platforms and to adapt methods from other data-driven fields to the peculiarities of biology. Perhaps even more importantly, biologists need to receive the training required to understand and employ these techniques in their own work. Fortunately, data literacy training is on the rise in schools and universities, computer programming is easier to learn than ever and the relevant computational tools are being packaged more and more accessibly. Slowly but surely, data science will thus find its way into the developmental biology toolbox.
Figure 3: A digital tissue (or digital embryo) is a multi-modal dataset that enables integrated analyses of the diverse factors that play together to mediate organogenesis.