- Huge impact on medicine and research- driving biology in many ways
- “The inside story of how these bitter rivals mapped our DNA, the historic feat
that changes medicine forever.”
DNA and the Revolution In Personalized Medicine- pretty soon when you go to
the doctor they will look at your genome sequence to properly prescribe you
Every cancer differs dramatically- How the cancers differ will determine how
you will be treated.
Why Sequence Genomes
1. Identifies all of the genes that characterize an organism –the
blueprint of life. (have all of the parts of the puzzle)
2. Allows comparative analyses between organisms. (confirmed
the unity of life; helps to define function.)
3. Done within any individual species it may identify differences that result in
disease. Sequencing the human genome has/will identify the genetic basis for
many diseases. ($1000 genome
4. Identifies potential drug targets. (eg parasites may have unique
genes and gene products that can be inhibited)
Sequencing a genome involves.
1. Creating a Genomic DNA Library.
2. Doing many independent sequencing reactions.
3. Aligning the independent sequences into a continuous genomic sequence. And
then try to annotate the genome.
It’s not any more complicated in terms of features than we talked about before but
the scale is enormous.
A Genomic DNA library
Defn: A collection of cloned DNA fragments representing all of the DNA in an
If we clone all the DNA from an organism into plasmid vectors, that would represent
a genomic library for that organism. All of the clones represent all of the DNA in the
genome. Assume we have the whole human genome, we cleave that with RE (this is the case
where we have partial digest) to get millions of genomic fragments. Then, we insert
all that fragments into plasmid vectors using DNA ligase, so we get representanents
of each of those fragments in the population. Then that pop of clones is transformed
into E.coli, which then makes up our genomic library.
Human DNA--- millions of genomic fragments
Once you have the fragments you clone them into the plasmid vectors and you will
get recombinant DNA molecules.
Sequencing the H. influenza genome:
Gram negative bacteria
Genome aprox 2 million base pairs.
To get the coverage of a full library:
2 million base pairs / 2000 bp/clone
= 1000 clones minimum
To be sure you have full coverage you would want an excess of several fold
Creating the Genomic Library Here we have the bug, we extract the DNA from it. Sometimes we don’t want
specific fragments so we can sonicate the DNA instead of using RE.
We can purify those fragments in gel electrophoresis, and then select for those for
example that are 2000bp in size. So the fragments are randomly being cut but are
sorted according to size. We then take that DNA and clone it. We purify the DNA and now we prepare our
library. In this case 20 000 clones are used (same as example ,we had fragments that
are on average 2000bp so minimum of 1000 clones but we use 10-20 fold=20000).
These are spotted onto grids so this is done in an automated fashion.
Then, we randomly sequence all the clones we have. We get 25 000 independent
sequences (12 million bp, and each sequence=400bp).
Each of the 20 000 clones represents an independent part of the genome
Sequence the ends of the genomic clones
One clone Isolate plasmid DNA - Anneal primer -
They got 25000 runs from 20000 clones and in total they got 12 million base pairs
of DNA sequence. (aprox 6 fold excess in DNA sequence)
Align the 25, 000 sequences into contigs
First a computer searches for overlaps between the 25 000 sequence runs.
Overlapping sequences are arranged in contigs.
Aligning the independent sequences
into a continuous genomic sequence Sequence CONTIG:
A contiguous DNA sequence representing a portion of the genome.
Sequence assembly is first done by computer searching, looking for overlaps
Now we get to the part where we have to put everything back together. Once we
have all the sequence fragments we have to align them into a continuous genomic
It’s first done by computer looking for sequence overlap. It finds 2 sequences that
over lap and stick them together to form part of the contig, shown in next slide.
20, 000 clones (25000 sequences representing 12 million bp and they can align that
into 140 contigs.
With 140 contigs you will have 140 gaps in a circular genome. We need to fill in the
gaps to order the contigs and to complete the sequence.
2 types of gaps: sequence gaps and physical gaps.
“Sequence gaps” are ones that can be closed by sequencing clones already present in
the library. The computer looks for a clone DNA in which you have sequence to 2
Showing 2 contigs that were put together simply through alignment. They know
they got clones in the pool with one of its sequence ends matching one contig, while the other side matches this contig. They know that this sequence overlaps (is found
between in the gap between these contigs).
- The sequence in the middle of that clone will complete the sequence in that
- Had to sequence the clone from both side to find the one that will fill in the
There can be major challenges in fitting together the independent sequences.
What about the gaps?
The other type of gaps. Some sequences will not be represented in the original
library resulting in gaps in the sequence, which means we often have to construct
another library using a slightly different protocol in order to get those clones.
Two strategies are often used; one is based on a probing hybridization type strategy.
Another is based on PCR strategy.
These generally represent sequences that