Class Notes (836,992)
Canada (510,027)
Biology (Sci) (2,472)
BIOL 200 (478)
Lecture 14

LECTURE 14b.docx

10 Pages
Unlock Document

Biology (Sci)
BIOL 200
Thomas Bureau

LECTURE 14: GATTACA: Genomics and Next-Generation Sequencing In this lecture, new methods of DNAsequencing, which replace the original Dideoxy Sequencing method (Sanger Sequencing) – the way the human genome has been originally sequenced – are presented. Genomics • Original definition/goal: Determining the DNAsequence of the genome o to identify the location of genes (process of genome annotation). • Accuracy level: 70-80% (not terribly good, but we’re getting better at it) • Modern days: new initiatives (referred to as the –omics era) o Genomics (still present) o Functional Genomics: Find the functions of the genes (globally, with a holistic approach rather than by evaluating the function of one gene at a time). o Proteomics: Find the functions of the protein that is encoded by each gene.  Functional and Proteomics: more to come in Roy’s section. o Evolutionary Genomics: Achieved through complete sequencing of multiple genomes and comparing them to deduce their evolutionary progress throughout the years. o Transcriptomics: Understanding the population of transcripts. o Phenomics: Analysis of the entire phenotype [observable characteristics of an organism], and its evolution [how it changes through developmental time]. o Spliceomics: Observing splice elements/variants. o Others: glycomics, metabolomics, lipidomics, predictomics o Eventually “omicsomics”…?  Popular theme: Looking at -omics in a more holistic way. Organism genomes that were sequenced: • First organism to be sequenced: Epstein-Barr virus (192 kbp), in 1984. o Considerable effort back then considering the primitive technology available. • Since then, the DNAsequence from thousands of viral, organellar, prokaryotic and eukaryotic genomes have been sequenced. • Focus on disease-causing viruses and bacteria o Influenza Virus and Yersinia pestis, for example • Eukaryotic organisms: certain model organisms’genomes were sequenced first and are more frequently used in labs. o Yeast: One of the first, b/c its genome is very small (17 Mbp). o Roundworm (Caenorhabditis elegans): Common model system, b/c easy to sequence – we know its exact cellular composition [we can fake map every cell in the organism, we know how many exactly there are, etc.]. o Fruit fly (Drosophila melanogaster): Often used in molecular genomics. o Zebrafish o Mice: One of the last to be sequenced, because its genome is relatively large. 1 o Plant (Arabidopsis thaliana): Common weed – often found in the cracks of sidewalks and other places – this fast growing plant is easy to sequence, because it has a small genome that is easy to manipulate. Model organism despite its lack of agricultural benefit. • Now: There are over 2000 complete/ongoing eukaryotic Genome Projects [sequencing projects] o Chimpanzee o Fugu fish: Model system for invertebrates. Easy to sequence, because it has a very streamline genome with very few transposable elements to get in the way of understanding what the genes do. o Sea Squirt: Model for vertebrae development (similar to the zebrafish). It has a backbone in early stages, but later loses it in the adult phase. o Mosquito: Has malaria carrying potential (and other diseases) o Rice: One of the early genomes to be sequenced, because it is an important part of the human diet (main source of calories for most people on the planet). • Thanks to modern, more efficient technologies, many more genomes have been sequenced recently: o Others: silk worm, armadillo, elephant, insect that carries the Chagas Disease, banana, chicken, several fungi, tomato, radish, lettuce, spider mite, etc. Sequencing method: Shotgun Sequencing Main obstacle to DNAsequencing: we can’t sequence the long chromosome directly (DNA Polymerase can only read sequences containing a maximum of ~1000 base pairs = limitation) We had to break down these very large molecules/chromosomes extracted very carefully using the following technique. 2 Soln: Fragmentation of chromosomes First step of Shotgun Sequencing - Fragmentation: • We use many exact copies of the chromosome (For example, there are 5 in the image below, on the left) • Break down the chromosome into small fragments of different lengths (done by mechanical shearing or enzymatic breakdown of the DNA) • The reference dots on the image indicate that the copies of the chromosomes are identical to one another. We can see that each segment containing these dots differs in length after fragmentation of the chromosome. • The idea is to capture the sequence in every one of those bits and reassemble all the information back together. This is what the next step is about. Second step of Shotgun Sequencing - Assembly [VERY COMPLEX] • Explanation of the above diagram: The reference sequence is the Original Strand. We typically don’t know it; this is what we are trying to achieve. The blue region indicates the overlap between the two fragment sequences. • DNAPolymerase helps determine the sequence of each small fragment. This information is then combined to form a full chromosomal DNAsequence. • To form the original segment, we need fragments from different copies of the chromosome that contain common/overlapping regions. The common regions for consecutive fragments are aligned, creating multiple tiling paths, and eventually the small segments are recombined into a longer DNAfragment. (See image on the right) • Complicated, because we have to recombine in a faithful way millions/billions of similar segments of DNA(more pieces than the world’s largest jigsaw puzzle – 24,000 pieces, 1.5 x 4.2 meters). o Contiguous sequences (contigs): Reconstructed sequence of segments with overlapping regions. Shown in red on the diagram below. Various contigs are separated by gaps. [Wiki: Sets of overlapping clones that form a contiguous stretch of DNAare called contigs]. o Gaps: Regions that separate the contigs and contain no sequence information (so we can basically bridge those contigs together). 3 o Tiling path: Combination of contigs and gaps. [Wiki: minimum number of clones that form a contig that covers the entire chromosome comprise the tiling path that is used for sequencing]. o Scaffold: Result of the assembly of the tiling path (original DNAsequence). Region that represents a series of contigs in their appropriate order. There are still gaps (so it’s not a complete sequence). [Wiki: consists of overlapping contigs separated by gaps of known length]. Genome sequencing status [where most of the projects are at the moment]: • The grid on the right indicates the number of genomes that have been sequenced for viruses, prokaryotes, and eukaryotes (can be more than one per species) • For viruses, there are no contigs or scaffolds currently deposited in databases, because it’s so easy to sequence a viral genome completely due its short length. (one contig usually covers the entire genome). • Some sequencing projects are left incomplete (keep contigs, scaffolds or raw reads), because further sequencing is unnecessary for that particular project and not because the scientists are inapt, especially in the case of prokaryotes. • Genomes that are 100% sequenced are rare – but there are a few. They’re usually missing a few bits. Eukaryotic genomes, for example, are almost 100% sequenced. • Raw reads: Bits of genome that haven’t been assembled yet (no project yet). • Most virus and prokaryotic genomes have been sequenced [their sequencing status is relatively stable since we hit all the major disease-causing bacteria and viruses], while there are still many eukaryotic genomes left to sequence (or re-sequence individuals of the same species, too). o Some genomes have been sequenced several times. (For humans: 1000s of times, since it was sequenced completely in year 2000). Initial sequencing method to about 2008 or so: Dideoxy DNASequencing • Requires a lot of workers, technological machines and money • Works very well. The quality of the output is great. • Now, there is a great push to speeding up DNAsequencing. 4 New sequencing methods (also known as Next/2nd Generation Sequencing) include several platforms, of which 2 persisted: 1) Roche 454 pyrosequencing (also known as 454 Sequencing or Pyrosequencing) 2) IlluminaSolexa sequencing (also known as Illumina) Next Generation Sequencing (also known as 2ndGeneration Sequencing): • High-throughput (The method is very efficient): (with pyrosequ.) o Massively-parallel: Millions of strands are sequenced at the same time, as opposed to Dideoxy DNASequencing, which sequences only 100 fragments at a time [50-100 fragments could be loaded on one gel, for instance]. o Done with specific techniques (technology improvements):  Microfluidics: moving very small volumes of liquid  Fixed synthesis: DNAstrands, being sequenced, are not transported in a solution, but actually fixed on a matrix  High-resolution microscopy: It allows us to actually visualize the process of DNAsynthesis with a more detailed image • Read length: The maximum length of a DNAsequence that can be read and synthesized from the primer sequence by DNAPolymerase. o Dideoxy DNASequencing: approximately 1000 nucleotides. o New Generation Sequencing: shorter length, but the high-throughput/massive s
More Less

Related notes for BIOL 200

Log In


Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.