People differ in their copy number of genes: different duplicate numbers
If two base pairs are only 50 bp away when it should be 5KB away = deletion
If two base pairs are farther together than it should be = duplication
1. Amount of diversity:
Only forces acting on genetic variation and evolution are mutation and genetic drift = no
Negative/purifying selection against mutations may act on deleterious mutations but they
are strong enough to not affect diversity of evolution. All deleterious mutations are
Positive selection happens so rarely
Standard neutral model: constant population and random mating
Average pairwise differences: pi (average difference between each pair)
S: number of segregating sites ( all differences can be in one individual)
Clicker 1: A
Average pairwise difference (Pi): 4NeU. U is the neutral mutation rate
Polymorphism informs us about mutation and Ne
E(Pi) = 4Neu
Number of segregating sites
Total branch length times mutation rate = polymorphism
E(S) = 4NeU (1/2 + 1/3+ 1/4….1/n)
As the sample size increases, the number of new segregating sites saturates and plateaus
At a sample size of 12 variance in Pi is really low
Humans have a low Ne and we have lower genetic diversity than chimpanzees
Rosenberg: Strong negative correlation of distance from East Africa and genetic diversity,
Africans have the highest genetic diversity
Ramachandram: The further you go from Peru the higher the genetic diversity. South Africa is
the end point of migration
We have our origins in East Africa, colonize in new places and have bottleneck effect
Middle East is the estimated exit point in migration out of Africa. The serial bottlenecks cause
the reduction in diversity
Africa have highest diversity due to: source population (others have bottleneck effect), gene
2. Allele frequencies
How many polymorphisms are found once, twice, thrice etc.
Neutral theory: How many rare polymorphisms are found once, twice, thrice etc? Majority of
mutations are only expected to be found once = not fixed! Some common variants but a lot of
rare ones due to purifying selection eliminating the common variants Vast majority of the minor alleles were found in one individual, expect 40% of alleles to be
found in one individual, however almost all are found in one individual. Too many variants to
confound to the expectations
Ascertainment bias: We only look at polymorphisms in a subset of genomes =uniform
distribution. Hap-map project.
Yaruba (African) have more rare variants than East Asians and Europeans and less common
variants than East Asians and Europeans. Strong negative gradient = more rare and less
common variants. Perhaps selection is not strong enough in the East Asians and Europeans to
completely eliminate the common variants = variation!
Population bottleneck: In European sample, less than 10% polymorphisms are found at rare
frequencies, most polymorphisms are found at intermediate or high frequencies. Perhaps 1)
the rare alleles are lost and thus 2) the other alleles become more common
Population expansion: higher frequency of rare alleles, larger sample
Purifying selection also distorts the frequency spectrum: more conserved have higher ratio of
rare variants, non-synonymous x and non-synonymous autosome has greater ratio rare variants
than NS X and NS autosomes respectively. More purifying selection in X chromosomes due to
males not being able to hide, more rare variants in non-synonymous sites
NS has more rare alleles than S sites, S sites has even less alleles than neutral sites.
Violation of Neutral model: population size changes, weak purifying selection can skew the
Kimura only thinks that there are variations in the neutral sites and these are determined only
by drift and mutations, since purifying selection is strong enough that it is going to eliminate all
the variations in the deleterious sites. However purifying selection can be weak and thus will
not completely eliminate all the deleterious mutations, making there be variations in both
deleterious and neutral sites.
The more conserved regions undergo stronger constraint and purifying selection, thus there will
be more RARE ALLELES than common variants since the purifying selection have already
eliminated the common variants, thus non-synonymous sites that have stronger purifying
selection also will have more rare alleles than common alleles.
Just because a site has more rare alleles does not mean that it is more polymorphic,
polymorphism have nothing to do with the ratio of rare: common alleles!
Population will affect sites equally, whilst purifying selection will only affect sites selectively.
3. Linkage equilibrium
Genes on different chromosomes are unlinked = independent assortment Genes on different chromosomes have different coalescent histories
Genes further apart on the same chromosome will experience more recombination, less linkage
Genes closer together will be more linked with each other
Linked genes have correlated evolutionary history
Genetic drift is a noisy process: different genes have very different patterns coalescence history,
just by random sampling
The letters represent samples of diversity: the more recombination events the more different
the coalescence history, sites that are further apart have the highest coalescence time due to
A single gene will depart average patterns of diversity, Africans may not have the highest
diversity in that gene
Coalescent times are highly variable across genes. Autosomes have highest coalescent times for
due to effective population size. Haploid have lowest coalescence times
High linkage disequilibrium: how correlated is variation across sites?
If high LD then TA and AG are always together in that sequence.
If low LD then there is no clear what genotype you are, TA or TT or AA etc.
D= frequency of AG – frequency of A times frequency of G
D’ standardizes by the maximum possible value of D
R squared-D squared divided by the product of the allele frequencies
Linkage disequilibrium is a function of the inverse of Ne and c (per gamete per
generation rate of recombination)
Large population size = more recombination = lower number of linkage disequilibrium
If no recombination then drift will affect the whole chromosome equally instead of
As genetic markers gets further away, the linka