Brandl Lecture 10 Notes 11/30/2012 Sequencing Genomes  Involves: o Creating a genomic DNA library o Many independent sequencing reactions o Aligning the independent sequences into a continuous sequence Genomic DNA Library  Collection of cloned DNA fragments that represent all of the DNA in an organism’s genome  Each cloned DNA is like a different book in a library  Together all the clones, represent all the DNA in the genome  Constructing a genomic library o Genomic human DNA is partially cleaved with restriction nuclease to obtain millions of genomic DNA fragments o DNA fragments are then inserted into plasmids using ligase and introduced into bacteria Sequencing of H. Infleunza Genome:  Gram negative bacteria  Genome – about 2 million base pairs  Q: The H. influenza genome is 2 million base pairs. How many independent clones must be in the genomic library to cover the genome at least once, assuming an average insert size of 2000 base pairs? A: 1000 clones (2 000 000 base pairs / 2000 base pairs per clone) – would be absolute minimum  Creating the genomic library: o Start with millions of cells o Extract DNA o Sonicate to obtain DNA fragments of various sizes o Agarose gel electrophoresis o Purify the DNA fragments of about 2000 base pairs o Prepare clone library o 20, 000 clones – each represents an independent fragment of the genome  Sequence the ends of the genomic clones o One clone – isolate plasmid DNA, anneal primer, and obtain dideoxy sequence  Sequence the ends of all 20, 000 genomic clones o Obtain end-sequences of DNA inserts o 25, 000 sequence runs from 20, 000 clones o Resulting in total of 12 million base pairs of sequence  Difficult part is putting all 25, 000 sequences together in the correct order  Align the 25, 000 sequences into contigs o First a computer searches for overlaps between the 25, 000 sequence runs o Overlapping sequences are arranged in what are called contigs  Sequence contig – a contiguous DNA sequence representing a portion of the genome o Not a physical entity, an abstract entity strung together by a computer  If you had one contig, the sequence would be complete o With 140 contigs you will have 140 gaps in a circular genome o We need to fill in the gaps to order the contigs and to complete the sequence  There are two types of gaps: o Sequence gaps – regions that are represented in the library but have not been sequenced  Can be closed by completing the sequence of clones in the library  The computer scans for original cones that are found in 2 different contigs  Look for clones that have sequence for one contig on one side and sequence for another contig on the other side  Once found, the sequence can be completed and the gap will be filled o Physical gaps – regions that are not represented by clones in the original library  Some sequences just don’t clone well in E. coli, thus won’t be represented as they are toxic to the bacteria  Filled in using PCR, with genomic DNA as the template Brandl Lecture 11 Notes 12/03/2012 Annotating the Genome  What are the functions of the 2 million base pairs of sequence?  What types of information does the genome contain? o Protein encoding genes o tRNAs o rRNAs o Other function RNAs o Small regulatory RNAs o Regulatory sequences: promoters, terminators o Origin of replication o Telomeres (in some organisms)  How are protein encoding sequences identified? o Open reading frames (ORF) – a series of codons starting with an initiation codon and ending with a termination codon  Lots or ORFs in the genome – some code for proteins, some don’t o Computer program scans the genomic sequence, searching for ORFs that begin with
