Class Notes (807,350)
Canada (492,715)
MICRB265 (68)
All (10)

Genomics Overview

7 Pages
Unlock Document

University of Alberta
Microbiology (Biological Sciences)

Microbial Genomics OVERVIEW This chapter covers the cutting edge of microbiology: genomics, proteomics, and metagenomics. A history of genomics is followed by what can be learned from genomics, how we do genomics, and the post-genomics era. Examples of some interesting and exciting discoveries are given along the way. OBJECTIVES After reading this chapter and attending lecture, you should be able to: 1. Explain the history of genomics and the battle between government and industry scientists to complete the first genome sequences 2. Explain how gene content and genome size are related. 3. How complete genome maps are reconstructed from sequences and major pathways are determined. 4. How the origins of genes are identified based on gene sequence analysis and the limitations associated with pure in silico studies. 5. Describe how genomic DNA is prepared, cloned, and analyzed for complete sequencing. 6. Understand the difference between Sanger sequencing and next-generation sequencing methods 7. Understand the concept of comparative genomics and what it can tell us. 8. Describe the process of determining differential gene expression between two cell types or growth conditions using genomic microarrays. 9. Understand the difference between cDNA and proteomic studies and their limitations. 10. Understand the concept of RNA Seq 11. Understand the difference between the “core” genome and the “pan” genome and the role of genetic “islands.” 12. The theory of environmental metagenomics and why this is a powerful new approach for describing novel genes and their functions. CHAPTER OUTLINE I. History of microbial genomics Microbial genomics is an offshoot of the human genome project (HGP), which started in 1990 and was t9e first “big science” project focused on biology. The goal of the HGP was to sequence all 3 x 10 nucleotides of the human genome. There was a very public race to complete the first genome sequences and the first human genome sequences by two groups: the public US NHGRI, led by Francis Collins, and the private The Institute for Genome Research (TIGR), led by J. Craig Venter. They took two very different approaches—1) the public group used “chromosome walking”, a slow, steady process requiring a new sequencing primer be designed from the previous sequence, thus increasing the sequencing information in an iterative fashion, 2) TIGR used “shotgun cloning”, an approach where random pieces of DNA were sequenced and “stitched together” from overlapping regions into the complete genome. The latter approach was shown to be much more cost and time efficient, and was used to produce the first genome sequence—a bacterium called Haemophilus influenzae in 1995. Bacterial genomes are much smaller than the human genome, and thus this approach demonstrated that it was technically feasible to sequence an entire genome. The public group abandoned the chromosomal walking approach in favor of the shotgun cloning approach, and working together with TIGR, the complete human genome sequence was published in 2003—in less than ½ the time that was expected. Since then, the number of published genomes has been increasing exponentially and now numbers about 4000—nearly all prokaryotic. Recently, a new approach has also been used—metagenomics, or the sequencing all of the DNA from an entire environment (e.g. a water sample, a soil sample, the human gut, the human mouth, etc.). While in most cases individual genomes cannot be reconstructed in this way, the inventory of genes and potential metabolic pathways in those environments can be determined. II. Metagenomics: two examples 1. Proteorhodopsin. In a study of ocean water, Beja and colleagues made a metagenomic clone library from the sample. They screened the library for 16S rRNA genes—genes which would provide them with information on “who” the clone came from. One of these clones had a 16S rRNA gene from SAR86—an organism that had never been cultured, but was known to be widespread and abundant from previous molecular studies. On the same clone that had the SAR86 16S rRNA gene, they found a gene that was related to the rhodopsins found in halophilic archaea. These molecules responded to light in the halophilic archaea to carry out several different functions—Cl transport, photosynthesis, and phototaxis. They named the new gene “proteorhodopsin” and cloned it into E. coli, where they showed it was capable of light- dependent proton pumping. Thus, its role in the environment is to enable photoheterotrophy in cells that contain it. Further studies of this gene showed that it is found in many uncultivated groups of marine bacteria, and that photoheterotrophy is a very common metabolism in the ocean —something we had no idea about prior to the metagenomic study. 2. Human gut microbiome study. Based on 16S rRNA genes, the two main groups of bacteria in the human (and mouse) gut are Bacteroidetes and Firmicutes. In mice, a mutation in the leptin gene makes them obese, even if they are eating the same amount as wild type mice that are lean. In a study of the gut microflora (the microbes living in the mice guts), Turnbaugh and colleagues found that in the obese mice, Firmicutes were significantly more abundant and Bacteroidetes were significantly less abundant than in the wild type mice. In a metagenomic study of these mice, the obese mouse gut microbiota were found to contain more genes involved in breaking down otherwise indigestible polysaccharides and those that are involved in the metabolism of the sugar products. The feces of the obese mice contained significantly less energy than the feces of the lean (wild type) mice, indicating that the gut microbiota of the obese mice were more efficient at extracting energy from food. Thus, obesity may in part be due to the microbiota in the gut and their efficiency at extracting energy from food. 3. What are some things we can learn from studying genomes? A. ORFs The first step after obtaining sequence information on a genome is to identify the open reading frames (or ORFs)— segments of DNA that have recognizable operon structure: a transcriptional start site (-10 and -35 regions), a ribosomal binding site (Shine-Dalgarno sequence), a start codon for translation spaced appropriately from the Shine-Dalgarno (ATG), and at least 300 base pairs representing 100 codons before reaching an in-frame stop codon. Certainly, some genes have fewer codons, but the majority of polypeptide-encoding sequences have greater than 100. These ORFs generally correspond to genes—but to call them genes, you would have to demonstrate that they are actually transcribed and translated, and that they play a role in the cell. ORF content is proportional to genome size in the prokaryotes, but this relationship breaks down in the eukaryotes. Some organisms have undergone genome reduction—the loss of genes (ORFs) because they live in environments where those genes are not needed. For example, the genomes of free living bacteria range from ~1.5-10 million nucleotides, with a median around 4.1 million base pairs; however, obligate parasites have range of 800,000 to ~7,000,000 nucleotides with a median of 2.4 million and obligate symbionts have a range from 400,000 to about 2 million nucleotides, with a median of about 900,000 nucleotides. Interestingly, the genome sizes of some of these obligate symbionts are smaller than the genomes of some large viruses. Computer algorithms have been created that search through the reams of genomic DNA sequence to not only identify ORFs, but also to predict their function based on DNA and amino- acid sequence similarity to known genes. After the computer has had a shot, researchers must go through the information by hand to determine if the computer was correct. This process is called “annotation” and is extremely labor intensive. Annotation is based on homology to previously characterized gene sequences; thus, gene function can actually be quite different than predicted. Thus, genomics can only provide us with hypotheses regarding the organism’s function— hypotheses that must be tested at a later time. B. Genome maps Once a genome sequence is annotated, several maps representing the informational content can be generated. One of the simplest maps represents the genome as a series of informational circles. These circles are color-coded and show such things as size of the genome, origin of replication, numbers and positions of tRNA genes and rRNA operons, numbers and positions of ORFs on both strands of DNA (remember that genes can be encoded on both strands of the double helix), transposons or insertional sequence elements (indicating transposition of genes), and G+C content of the genome. All of this information is basic and allows one to compare the general characteristics of multiple bacteria with one image. This information has been provided for 100’s of bacteria to date by two main US sources, the non-profit The Institute for Genomic Research (TIGR), which is now defunct, and the governmental Joint Genome Institute (JGI) run by the US Department of Energy. Some genome sequences have also been completed by individual researchers through Genome Canada and by the governmental French organization, Genoscope. Soon, thousands of complete bacterial genome sequences will become available due to activities of several more companies internationally, and also because bacterial genome sequences are being added as “filler” to large eukaryotic genomic sequencing initiatives. Synteny, or comparing the order of genes, can tell us about the evolution of the genomes of different organisms and how often recombination has occurred. These kinds of evolutionary history studies are one major goal of comparative genomics, where the genome sequences of closely related strains and species are compared to each other. This provides insights into which genes or ORFs of unknown function (also known as URFs) are critical (shared among all strains) and which are ancillary (found among only a few strains. This has led to the concept of the core genome, consisting of genes shared by all strains of the same species, and the pangenome, which includes both the core genome and the ancillary genome, which is found in only one or a few strains of a species. Comparative genomics has also led to the recognition of chromosomal islands—regions of the genome that appear to have come from other organisms. These regions are often flanked by inverted repeats (a signal that they may have come from transposons or viruses) and have different G+C content or codon usage relative to the rest of the genome. Many of these chromosomal islands code for genes important for virulence and are called pathogenicity islands. Transfer of a pathogenicity island can turn a normally
More Less

Related notes for MICRB265

Log In


Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.