Class Notes (807,350)
Canada (492,715)
Anthropology (1,977)
ANT203H5 (95)
Lecture 16

Lecture 16 - A Brief Review of the Human Genome Landscape

4 Pages
Unlock Document

University of Toronto Mississauga
Esteban Parra

A Brief Review of the Human Genome Landscape The Human Genome Project - ….was officially completed in 2003. In this lecture we will review what we have learned about our genome. There are some interesting surprises! Some Facts About the Human Genome - The human genome has around 3,200 million nucleotide bases (3,200 Mb). o This compares with:  Escherichia coli: ~4.6 Mb  Saccharomyces cerevisiae (Yeast): ~12.5 Mb  Caenorhabditis elegans (Worm): ~95.5 Mb  Drosophila melanogaster (Fruit fly): ~122 Mb  Fugu rubripes (Puffer fish): ~365 Mb  Oryza sativa (rice): ~389 Mb  Mus musculus (Mouse): ~3,000 Mb  Allium cepa (onion): ~15,000 Mb - The number of genes in the human genome is approximately 21,000, according to a recent estimate. This is much lower than previous estimates of 80,000-140,000. - This compares with: o Escherichia coli: ~4,400 o Saccharomyces cerevisiae (Yeast): ~5,700 o Caenorhabditis elegans (Worm): ~19,800 o Drosophila melanogaster (Fruit fly): ~13,500 o Fugu rubripes (Puffer fish): ~similar to human o Mus musculus (Mouse): ~similar to human o Oryza sativa (Rice): ~40,000-60,000 - Less than 3% of the genome corresponds to protein-coding genes (even less if one considers only protein-coding exons: 1.2%). o In the genome, there are gene-rich regions, which typically have a relatively high GC content, and gene-poor regions, which are richer in A and T bases. - Repeated sequences that do not code for proteins make up at least 50% of the human genome. - In spite of having a lower number of genes than initially expected, the human (vertebrate) proteome is more complex than in other animals (Worm or fly). The main reasons are: o More transcripts per gene due to alternative splicing o More complex protein architecture (in terms of the number and arrangement of protein domains-regions within proteins with a well-defined set of properties or characteristics). - Other interesting facts, based on the Human Genome Project and recent data from the 1,000 genomes project: - Mutation rate is higher in males than in females (2:1 ratio), at least in part due to the higher number of cell divisions required for sperm formation than for eggs. - When sequencing individual genomes, the 1,000 genomes project described that the mean number of variant SNP sites per individual ranges between 2.8 and 3.4 million (depending on population) and the mean number of variant indel sites per individual between 350,000 and 385,000. - Putative functional variants o An individual typically differs from the reference human genome sequence at:  10,000-11,000 non-synonymous sites  10,000-12,000 synonymous sites (do not change the amino acid)  190-210 in-frame indels  80-100 premature stop codons (  40-50 splice-site-disrupting variants  220-250 deletions that shift the reading frame Navigating the Genomic Landscape - In the remaining part of the lecture, we will review in more detail the human genome landscape, underlying the most interesting findings and surprises. - We will also compare the human genome with the genome of other species, such as the yeast, the worm, the fly and the mouse, when relevant. Broad Genomic Landscape - Recombination in the human genome: o Similar to what happens in GC content, there is quite a lot of variation in recombination rates in the human genome. o Long chromosome arms have a lower recombination rate than short chromosome arms. o Recombination tends to be suppressed near the centromeres, and is higher in the distal portions of the chromosomes. - Repeat content of the human genome o Repeat sequences account for 47% of the human genome (probably more, because it is not possible to recognize the oldest repeat sequences). In contrast, protein-coding sequences are less than 2%!!. o The portion of the human genome accounted for by repeat sequences (47% or more) is a little bit higher than in the mouse (37.5%), and much higher than in invertebrates, such as worm (7%), fly (3%) or some plants (mustard weed, 11%). - Main classes of repeat sequences: 1. Transposon-derived repeats*** o Long insterspersed elements (LINEs): 21% genome o Short interspersed elementes (SINEs): 13% genome o LTR transposons (LTRs): 8% genome (long term that repeats) o DNA transposons: 3% genome - Main classes of repeat sequences: 2. Simple sequence repeats (SSRs). o Simple sequence repeats (SSRs) are perfect or slightly imperfect tandem repeats of a particular core sequence. They comprise about 3% of the human genome. o SSRs with short repeat units (1-13 bases) are also known as microsatellites. They are used often in evolutionary and disease mapping studies. o SSRs with long repeat units (14-500 bases) are known as minisatellites. - Main classes of repeat sequences: 3. Segmental duplications. o Segmental duplications involve the transfer of large blocks of the human genome (1-200 Kb) from one genomic region to other locations in the human genome. They comprise around 3% of the human genome. There are two main classes: o Interchromosal duplications (duplication is observed in different chromosomes) and Intrachromosomal duplications (duplicated regions in the same chromosome as the original sequence) -
More Less

Related notes for ANT203H5

Log In


Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.