Book review: An introduction to decoding genomes

Posted by Development Book Reviews, on 29 November 2012

This book review originally appeared in Development. Jennifer Mitchell reviews “Introduction to Genomics” (by Arthur M. Lesk).

Book info:
Introduction to Genomics By Arthur M. Lesk. Oxford University Press (2011) 424 pages ISBN 978-0-19-956435-4 £34.99 (paperback)

The past 20 years has seen a revolution in genomics. From the completion of the human genome in 2003, which took 13 years, we are well on our way to achieving the next benchmark goal of having 1000 human genomes sequenced (The 1000 Genomes Project). This endeavour will provide a deep catalogue of human genetic variation. In addition, and as part of The Genome 10K Project (which aims to sequence the genomes of 10,000 vertebrate species), 2012 saw the initial assembly of the medium ground finch (Geospiza fortis) genome, one of the iconic Galapagos finches described by Charles Darwin. The results of these genome sequencing projects are available through freely accessible public databases, thus accelerating discoveries in diverse fields of biology. With the advent of next-generation sequencing platforms, the time and cost of sequencing have dropped dramatically, making the ability to sequence the human genome in a day for less than $1000 no longer science fiction but rather an event that will happen in the immediate future. The effects of this genomics revolution are widespread, and no field of biology or medicine remains untouched by the changes in sequencing throughput. Furthermore, genomic studies are so commonly highlighted by the media that a working understanding of genomics is increasingly important in undergraduate biology education.

The second edition of Introduction to Genomics by Arthur M. Lesk is a comprehensive introduction to genomics that covers a diversity of topics, from genome sequencing to systems biology approaches used for understanding the metabolome, transcriptome and proteome. This new edition strives to highlight the progress made in genomics due to the increased application of high-throughput sequencing techniques. The text is accessible to undergraduate students; it does a thorough job of providing the basic principles before moving on to more in-depth concepts. The author presents an important discussion of the ethical issues surrounding genome sequencing, including the efforts taken to protect individuals who contribute samples to the large-scale human genome sequencing projects that are currently underway. These issues are presented in a well-balanced and unbiased manner. Importantly, the text is a pleasure to read; detailed colour illustrations are provided throughout, as well as helpful analogies that allow the reader to get to grips with difficult concepts. For example, biological networks are compared to the London Underground map, where the stations are the nodes and the edges the tracks that connect them.

This new edition highlights recent advances in sequencing techniques while still presenting the historical context for the discovery of genomes. Early in the text, the history of the discovery of DNA structure, the need to understand the ‘language of the genome’, and early progress in sequencing techniques are discussed in a narrative manner. Lesk writes, “the sequence of the bases was like a text everyone wanted to read, not only was the text in an unknown language, but there were not even any examples of the language, because the sequences were unknown”, thus framing the importance of early DNA sequencing efforts in ultimately decoding the human genome. This leads into the development of Sanger sequencing, a method developed by Frederick Sanger, and the sequencing of the 5386 bp ΦX174 bacteriophage genome, the first completed DNA genome sequence. Following on from this is the adaptation of Sanger sequencing to automated DNA sequencing using fluorescent tags and next-generation high-throughput sequencing techniques. As in the rest of the book, colour illustrations are used to great effect to explain the techniques and to provide examples of data output. These examples of data output are increasingly important, as so few students today will ever perform a Sanger sequencing reaction and see how the individual base-terminated chains resolved on the gel are composed into a sequence. So, although this technique has been replaced with higher throughput variations, the visual understanding of the sequencing process provided by inspecting a Sanger sequencing gel remains unmatched.

At the end of each chapter, selected additional reading is provided with problems that test the concepts discussed. The problems posed range from testing the basic understanding of the material to more thought-provoking questions that will allow students to test and deepen their understanding of the material. Of special note are the ‘weblem’ problems, which require the use of online genomics resources. These encourage students to develop a proficiency in the use of these resources, many of which are linked to the text through the publisher’s website. A ‘guided tour’ of genomics websites provides a list of websites with short descriptions and links to instructions or tutorials where available; however, this is merely a teaser that will hopefully push young scientists to explore more thoroughly the information that is available online to the scientific community.

Although the second edition is updated with expanded content, the information on data gathered from next-generation sequencing projects is rather limited. However, as the author points out, this is a moving target with advances made weekly, and it is therefore difficult to ensure that the material is up to date in a text of this type. Even with this in mind, I found that the section on deep sequencing of transcriptomes and functional genomics could have been expanded upon; there is a huge wealth of genome-wide functional genomics data for human, mouse, fly and worm genomes generated by the ENCODE and modENCODE projects, which are only briefly mentioned (Gerstein et al., 2010; modENCODE Consortium, 2010; ENCODE Project Consortium, 2011). These data are easily accessible through online browsers (UCSC Genome Browser, modENCODE GBrowse), massively accelerating the discovery of new genes, non-coding RNAs and regulatory elements such as enhancers and insulators. With the focus on students exploring genomics data on the web, these resources could have been better highlighted.

Given the widespread impact that sequenced genomes have on research, medicine and the general public, an introductory text such as this is an important resource. Introduction to Genomics is beautifully illustrated, supported by end of chapter and additional online resources, and written in an eloquent and readable style. Beyond focusing on genome sequencing and comparative genomics, a good deal of the text is concerned with transcripts, proteins and proteomics. However, there is minimal mention of transcriptional regulatory regions of the genome. I would have liked to see more emphasis given to transcription factor binding in the genome, epigenetic modifications and chromatin features, which are proving invaluable in identifying intergenic regulatory regions. Given the observation that disease-linked single-nucleotide polymorphisms are more often found in non-coding than in coding regions (Manolio, 2010), understanding how regulatory regions function is crucial in decoding the human genome and understanding predisposition to disease. Nonetheless, Introduction to Genomics is a comprehensive textbook that provides a solid introduction to the study of genomes and will be a great resource to undergraduate students with a background in molecular biology. The text also provides a useful resource for graduate students in other fields who want to make use of the growing number of online genomics resources.