Biochemistry Lecture No. 36: Genome Analysis
Friday November 30 , 2012
Q) Assuming a genome size of 2 million base pairs, how many independent clones would need to be
constructed and sequenced, if the average insert size is 2000 base pairs? An absolute minimum of 1000
clones would be needed because to be sure you have full coverage you want a library with an excess of
several fold (about 20).
Creating A Genetic DNA Library For H. Influenza:
-Starting with millions of cells, you extract the DNA from H. influenza and sonicate it (instead of
restriction enzymes) into random DNA fragments of various sizes. You then purify the fragments
through gel electrophoresis and select for those fragments around 2000 base pairs in size. By cloning
these 2000 base pair fragments, you receive 20,000 clones (20-fold excess), where each clones
represents and independent fragments of the genome. Next, you isolate the plasmids from the 20,000
different clones and sequence them through dideoxynucleotide sequencing. The result is 25,000
sequences from 20,000 clones and approximately 12 million base pairs of sequence altogether (lots of
redundancy present). The computer then searches for overlaps between all the sequencing runs by
obtaining the end-sequences of the DNA inserts (each one being around 500 base pairs in length). Using
this information, it constructs sequence contigs (of which there were 140 in total). If the genome
analysis is complete, then only one contig should be present. The fact that there are 140 contigs means
that there are 140 gaps (these gaps need to be filled in to know the order of the sequence).
-A sequence contig is a contiguous DNA sequence representing a portion of the genome after alignment.
Sequence assembly is first done by computer searching, looking for overlaps (contigs are not physical
entities per se, but is something that the computer organizes).
Filling In The Gaps:
-There are two kinds of gaps: sequence gaps and physical gaps (they are both resolved differently).
Sequence gaps are the ones that can be closed by sequencing clones already present in the library (they
are fairly easy to fill in). The computer scans for original clones that span 2 different contigs (5,000
clones that were sequenced on both ends). If you can get a clone that can span two contigs, that means