Study Guides (238,405)
Canada (115,119)
Biology (1,419)

Biology 2581B Final: Final Comprehensive Notes

41 Pages
Unlock Document

Western University
Biology 2581B
L.Graham Smith

Lecture 13: Chromosomal Rearrangements - If we compare a mouse vs. human genome, at the sequence level there are many similarities - Same types of genes, genes with similar sequences - We can use dyes to label chromosomes, creating a staining pattern - Comparison of staining patterns between mouse and human genomes were quite different o 20 mouse vs. 23 human chromosomes - How can we have similar genes and different gene sequences, but have such different genomes? - Through genome sequencing, we found similar pieces between both genomes, but we would have to manipulate the sequences to achieve this similarity - Each colour here represents a different chromosome - Over evolutionary time, rearrangements occurred which account for these differences - People colour coded each gene in chromosomes according to a key that comes from the mouse genome o We can clearly see that fragments of the mouse genome can be aligned with the human genome o There are exceptions (e.g. X chromosome is basically the same) - Most chromosomes need rearrangements in order to account for gene content - Think of chromosomes as packages of a certain size, with a certain content Grass genomes - In the map shown here, we lined all the chromosomes from one species as a circle - The next circle has chromosomes where we tried to align them with the rice chromosomes o Very few gaps, but in general, this is very possible - Collinearity: we can co-align all these different genomes because they have huge fragments with the same gene order and gene identity - Still, there have to be rearrangements within the chromosomes in order to get everything to line up Are the rearrangements always deleterious? - Not always deleterious, but we’ve already seen that in human genomes, even the smallest rearrangements/deletions are fatal - We tend to think of rearrangements as having negative impacts on survival, but this isn’t always true o E.g. generation of antibodies – how are 10 different antibodies encoded by only 2-3 x 10 genes? o How is the immune system able to respond to new challenges? Rearrangements of chromosome sequences - V-segments, D-segments etc. represent certain sections of nucleotides in chromosomes - They’re all grouped together initially by their type - What comes in now is recombination events, which will push some D- segments close to J-segments - Another event will align to the V-segments, and then the constant region - What ends up is that in a particular cell that makes an antibody, we have a coding unit that’s made of all four areas - The variable region comes from the huge number of different recombinations that can occur Lack of diversity - In healthy individuals, there’s a huge array of different antibodies - In people with certain disease (e.g. lymphomas), the patterns can look quite different o Many peaks are reduced (loss of diversity), and fewer peaks - Making a diverse array of molecules is critical for our health and survivability Deletions - Loss of sequences – small deletions can affect a single gene - Large deletions can lead to the loss of 10s or 100s of genes - Can be caused by X-ray or other chromosome damaging agents that break the DNA backbone Detection of deletions - Compare the WT PCR product with the deletion PCR product - If the deletion is large, we can use dyes to stain our chromosomes, and then analyze through karyotyping - Most homozygous deletions are lethal o Even most heterozygous deletions are lethal - But there are exceptions o In humans, the Wolf-Hirschhorn syndrome is a deletion from the short arm of chr 4 o Cri du chat syndrome: deletion from the short arm of chr 5 - Deletions typically lead to severe phenotypes – humans can’t survive if more than 3% of their genome is deleted - Most deletions have an effect: gene dosage Deletions: effect on recombination - Deletion loop: all the sequences which aren’t present anymore can’t align between the two homologs when they come together o No recombination can occur within the deletion loop because there’s no identical sequence in the mutant chromosome - This is why we can’t use heterozygous organisms when estimating mapping distance because it will include deletion loops - Genetic distance between loci on either side will be underestimated Using deletions to map mutations - Example: polytene chromosomes in salivary glands of Drosophila - In development of gland formation, their chromosomes are replicating, but the cells don’t follow replication with a mitotic event o Replicated chromosomes aren’t separated into daughter cells o We end up generating thousands of chromosomes all in the same cell o We end up with huge chromosomes that all stay together, each consisting of 1024 double helices - When we stain these chromosomes, we see a very visible staining pattern - The small square on the left shows a normal chromosome in Drosophila in a diploid cell - We will also see banding patterns because similar sequences are next to each other, interacting with the dye in similar manners - We can also use fluorescent staining to visualize chromosomes - If deletion happens during formation of polytene chromosomes, we end up with some chromosomes that are WT, and others that have the deletion o These will keep dividing - Here, deletions are indicated by a red bar - We can map mutations onto a very specific area of the chromosome in order to determine exactly which gene is being affected by the mutation Duplications - Tandem duplications: close together - Non-tandem duplications are not close together - Duplications caused by chromosome breaks o X-rays can break one chromosome in two places, or break a homologous chromosome in one place Formation of duplication loops - Duplicated area can’t line up with the normal chromosome, so will form a loop - Either the first or second copy of the duplicated chromosome can form the loop - A double loop can form where the duplicated sections line up with each other (on the same chromosome) o This causes the normal chromosome to also form a loop in that particular region because it now has no regions on the duplicated chromosome to align with o This can be observed in polytene chromosomes Phenotypic effects of duplication - Often no obvious phenotypic consequences - But it is possible because duplications increase in copy number of a particular gene(s), can alter expression by placing genes in a different chromosomal environment o Affecting copy numbers; changes dosage - In humans, heterozygosity for duplications covering more than 5% of the haploid genome are most often lethal - Example: The Triplolethal gene from Drosophila o Here, any change in copy number is lethal o In the bottom example, we have duplication and deletion, which cancel each other out and produce a living fly Duplication and unequal crossover - Here, we have two homologs, both of which carry a mutation called 16A - When they undergo meiosis, they can align perfectly with each other - Sometimes we have imperfect alignment (e.g. the first copy of the duplication on the first homolog aligns with the second copy on the second homolog - The first homolog has its backbone broken between the two 16As, and one 16A is recombined onto the second homolog, creating one gene copy with 3 16A loci, and the other gene with only 1 16A locus - With imperfect alignment, we can have an increase/decrease of the repeated region - Example: the Bar mutation has negative effects on eyesight - Progeny carrying the Bar mutation can segregate into WT and Double- Bar o Now imagine what can happen in the progeny of Double-Bar mutants… Inversions - Here, the BCD area is now inverted relative to the outside segments - Two different types of inversion patterns o Involving the centromere: pericentric o Outside the centromere: paracentric Intrachromosomal inversions - Inversions occur often in presence of repeated sequences - Here, we see an inverted repeat - Chromosomes are quite flexible, so when we align things we can also flip the chromosome over because it has identical sequences, just inverted - Every time you have identical sequences, always think of recombination events which can happen using repeats as their base Inversions and gene function - Most inversions do not result in abnormal phenotype, but they do rearrange the order of genes - They don’t add/delete sequences, we’re just changing the order of the genes relative to the genes on the outside, so usually they don’t affect gene function o Unless the break points are in the middle of a gene, or the change in chromosomal environment impacts the regulation of gene expression - Here we see that part of the y gene is now on the other side of the gene o Kind of like we cut the gene in half, so this will definitely affect function Inversion loops - These occur in individuals which are heterozygous for the inversion when homologous chromosomes align during meiosis - We have to trick the chromosome in order to get alignment to the other chromosome Inversions affect recombination - Why does pericentric vs. paracentric matter? - For a pericentric inversion, if you try to align normal and inverted chromosomes, we have crossovers happening, and certain products that come out (shown on the right) - We have normal gametes and inversion gametes, but we also lose some gene information in recombinant gametes - Don’t have to be able to draw this, but think about the outcomes of these kinds of events - For a paracentric inversion, we can get chromosomes with no centromeres, or chromosomes with more than one centromere - We get one WT chromosome, one inversion, and then depending on the breakage point, broken products of a dicentric chromosome Translocation - Translocation: moving something to a different area - Most translocations don’t result in an abnormal phenotype, but they do rearrange the order of genes - They don’t add/delete sequences, so usually don’t affect the function of genes o Unless the breakpoint is in the middle of a gene, or the change in chromosomal environment impacts the regulation of gene expression - We can detect translocations using PCR - Design a primer which recognizes one of the genes and break points - If a break is occurring, the PCR product will occur because the primer will bind to the break point Lecture 14: Genes & Gene Expression 1 What is a gene? - The basic unit of biological information - Segment of DNA encoding a protein - Discrete region of a chromosome encoding a protein/RNA - These definitions are all WRONG! Example: look at the cox1 gene (in humans) - REMEMBER: the human nuclear genome is about 3 billion bp, human mitochondrial genome is about 16kb - Cox1 gene is about 1500bp long - RNA polymerase transcribes the double-stranded DNA into single- stranded mRNA - The mRNA gets translated into a protein that is 513 amino acids long - This gives us the protein called cytochrome C oxidase subunit 1 - If the mRNA = 900nt, we would get 299 amino acids o NOT 300 because the stop codon doesn’t code an amino acid!! (tricky) - This protein is involved in ETC, found in the Cu A centre (binuclear copper centre) - Cox1 is a part of a polycistronic RNA transcript - When you express Cox1, it shows up in a long RNA transcript that then gets processed o Essentially gets cleaved into individual mRNAs, some of which give you a protein (e.g. Cox1), others give you tRNAs or rRNAs - Not everything on the transcript encodes a protein!! - The cox1 gene itself is intact: the whole gene is there in one piece o Part of a polycistronic transcript o Encodes a protein o Continuous: the coding sequence begins and ends continuously - Let’s look at the same gene, but in a different species Cox1 in mushrooms - The Cox1 gene in mushrooms is about 20 times bigger than ours o Why is it so much bigger? It’s full of introns (19 introns, 20 exons!) - Those introns show up in the first mRNA that gets transcribed - Then the introns get spliced so you end up with a mature spliced mRNA that is more similar in length to the human Cox1 gene - In mitochondrial genomes, introns are slightly different than those in nuclear genomes o In mitochondrial introns, it’s much simpler; the intron will form an intricate secondary structure with loops and folds which allows it to easily be spliced out - The protein product is still the same - So the gene is 95% intronic! - The protein itself is only about 50 amino acids longer than ours Cox1 in Diplonema - This organism is a unicell - The Diplonema lineage represents the most abundant predator on our planet o Its mitochondrial genome is made of tons and tons of little circles - The Cox1 gene is not found on a single chromosome; the sequence for Cox1 is actually found on 9 different chromosomes! - When you want to express Cox1, you need to get 9 different transcripts, and then they all have to be stitched together at the mRNA level (need to make an intact transcript) in the right order to get the mature mRNA transcript o Then you can translate it into the functional protein - This is really weird, that 9 different chromosomes each have a small piece of the Cox1 gene - The way the genes get stitched together is called trans-splicing Trans-splicing - This is when exons located in distant regions/different strands/chromosomes are transcribed separately and then joined together - Shown in Diplonema - This process is pretty messy, not very well understood - We think that each of the exons has a little piece of an intron, then the two exons find each other while floating around o Introns will bind together by folding into their secondary structure, and that allows the two gene pieces to be spliced together Ribosomal slippage - Perkinsus is not an oyster, BUT it can parasitize oysters - The Cox1 gene in this organism is really weird - If you were to go along the DNA sequence in Perkinsus, you would find that it’s not in frame - The gene is found in frame, and mutations shift it to frame 2 and frame 3 o If there’s so many frameshift mutations, how can this be? - Sure enough, those frameshifts are conserved in the mature mRNA o We find 10 different frameshift mutations in the mRNA o This should make the gene non-functional - We then noticed that all frameshifts happened at certain motifs (e.g. AGGY or CCCCU) - People figured out that as the ribosome is reading the mRNA, it can actually skip/jump and correct the frame when it finds these motifs - So the frameshifts occur at a very conserved sequence motif, and the ribosome is evolved to recognize these motifs so it is able to jump - In the out-of-frame sequence, we would hit a stop codon - But in Perkinsus, the ribosome has evolved to be able to skip over the stop codon, and still transcribe the N amino acid Cox1 in Magnusiomyces capitatus - This is a parasitic fungus - In this case, the mRNA for the cox1 sequence shows the coding sequence in purple with some gaps in the middle that represent non-coding regions - We would think the black regions are introns - The only problem is that the black bits didn’t look like introns (no motif, which is classic signature of introns) - We also couldn’t figure out how the “introns” were being spliced - Found that when we hit the non-coding region on the strand, the non-coding region can form this loop o Ribosome will move along the sequence, hit the loop, jump, and land back where the coding sequence begins next o Ribosome stays in the coding sequences constantly, and every time it hits a black region, it will jump up, skip over it completely, and skip to the next coding region - This is super weird, not completely well understood Cox1 in Dictyostelium discoideum - This organism is an amoeba, non-photosynthetic - Its Cox1 gene is in two bits: cox1-a and cox1-b o Both bits are intact genes: both have start/stop codons, each gives a mature mRNA o One of them is encoded in the mitochondrial genome (cox1- a), and the other is encoded in the nuclear genome - The two amino acid sequences come together to form the mature Cox1 gene - If we went back in time, the complete gene would’ve been found in the mitochondrion, but some weird event happened so that part of the gene is now found in the nuclear genome - Cox1 gene is now made up of two genes, each part in a different genome - Nuclear gene gets shipped to the mitochondrion so the two mRNAs can interact to give the complete protein product Think about this: in every case we just saw, we have the same gene and same protein product, but there are so many different ways we can express this gene - Knowing all of this, the definitions given previously for “gene” seem stupid - What is a basic unit of biological information? - We already saw that DNA was found in different segments that coded for the same gene - “Discrete regions” also isn’t that accurate - To redefine “gene”: a continuous, discontinuous, or fragmented nucleotide sequence encoding a biologically functional molecule, such as a: protein, tRNA, or rRNA o The main idea is that it encodes something functional How are genes organized on chromosomes? 1) In tight-knit groups - In our mitochondrial genome, genes are organized in tight-knit groups - Sometimes genes are so tightly packed together that they overlap o When one gene ends, the other gene begins on top of it o E.g. we see the ATG that overlaps between the orange and blue genes o Overlap isn’t easy to deal with during gene expression 2) As lone wolves - Genes can also be on their own with nothing else around them - E.g. in our nuclear genome, we could thousands of base pairs without finding a gene - Gene deserts often occur in nuclear genomes where we rarely see any functional genes 3) On different strands - If the genes are encoded on different strands of DNA, RNA transcripts will come from different strands - It’s cool that sometimes one strand of DNA is being used, and sometimes the other strand is being used 4) In every which way! - E.g. Selaginella is a seedless vascular plant - In its mitochondrial genome, we have geneA, and it encodes a protein o Within that gene, we have an intron, and within the intron we have another gene which encodes a protein o What happens is the gene in the intron can be transcribed and translated to make an amino acid sequence o Then the intron in the outer gene can still fold into its structure even with the coding sequence for the other gene still in it o When it splices out, it does so with the gene inside of it, and then it links gene A together, giving you the mature geneA o When you transcribe geneA, the immature sequence has the geneB sequence, and sometimes you need to translate geneB, but sometimes the intron gets spliced out super quickly so you wouldn’t get translation of geneB - The other case is in the algae Euglena o It has twintrons: an intron within an intron o The brown intron can fold into its secondary structure, splice out, and allow the outer intron to come together, fold and splice out o The mRNA has two bits: first the brown bit folds and splices out, and then that allows the grey bits to fold and splice out Ribosomal RNAs - When you think of rRNA, we may think of a folded secondary structure - We can find a 5’ end and a 3’ end o If we denature the structure, we’d end up with a single piece of RNA - Sometimes rRNA can be fragmented and even scrambled so that the order of the gene isn’t in order anymore o Note: 9 fragments was arbitrarily chosen in the image - Recall: plasmodium is found in concatenated head-to-tail structures - Normally we’d just have two genes: one for the large subunit and one for the small o Here we have 27 pieces - The RNAs are all transcribed and folded into the mature RNA - These 27 pieces are never joined together through trans-splicing like we talked about previously - The pieces come together through base pairing of the secondary structure - Instead of an intact piece of rRNA we have a lot of pieces joined together through base pairing - The pieces find each other and form the secondary structure, which is really cool Lecture 15: Genes & Gene Expression 2 - Consider two populations of woolly mammoths, one whose genome took 45 000 years for extinction, the other’s only took 4300 years o The 45 000 extinct population is on the mainland, the other population on a small island - The sample size of the island population is much smaller than the mainland population - When mutations enter a big population, they behave differently than when mutations enter a small population - Mutations are rarely beneficial, so in this situation let’s assume that they’re bad mutations - In a big population, if the mutation doesn’t kill the individual, it makes the individual less fit o Hard for you to survive because there’s a lot of competition around you (you’re not cool but everyone else is) - In a small population, if the mutation doesn’t kill you, you’re still less fit, but there’s less competition because there’s not that many other individuals around you o So you could persist even with your mutation o You’re not cool but there’s not that many other people around you so it’s okay to not be cool - When we sequenced the genome from the mainland population, the DNA looked normal - When we sequenced the genome from the island population (4300 years ago), we saw all kinds of mutations in the genome that looked deleterious o Frameshift mutations in protein-coding regions that should’ve knocked out the gene o Obviously weren’t lethal, but presumed to not be good either - We saw that the woolly mammoth wasn’t woolly anymore; it was silky instead - Mutations to some proteins in urine that interfered with mating - This is an example of genomic breakdown - Mainland population probably disappeared due to human hunting - Island population probably disappeared due to their deleterious mutations - Gene expression is unconventional, and occurs every which way (saw this in previous lecture) - In malaria, mitochondrial rRNA is fragmented and scrambled o You still get the secondary structure, but it’s really fragmented o You get many different 5’ and 3’ ends More on rRNAs - This is a region of the human chromosome where we encode the LSU and SSU rRNA genes (large and small subunit) - Our rRNA genes are intact, but there are long repeats (tandem repeats) - When we get to the repeat region, there are 300-400 copies of the repeat located on 5 different chromosomes o That’s a lot of rRNA; why would we want to have this overload? - rRNA is important for gene expression, so it makes sense to want to have mass production of these genes Euglena - - This is an alga that has normal nuclear chromosomes: linear, millions of letters long, similar to ours - No rRNA genes on their nuclear chromosomes - Euglena ha shoved all its rRNA genes onto a circular chromosome o Entirely devoted to rRNA o The small subunit rRNA is intact, but the large subunit rRNA is fragmented - This circle is present in the nucleus about 600 times - Its nuclear system is haploid (each chromosome present only once) with the exception of the little rRNA chromosome that’s polyploidy, with 600 copies - Euglena breaks the “linear and fragmented” rule for nuclear chromosomes because it has circular chromosome for rRNA - It also combines monoploid and polyploid chromosomes into its nuclear system - There are many genes in the nuclear genome o In many systems, genes aren’t just individual bits Trypanosomes - Cause sleeping sickness, Chagas disease, leishmaniasis - Trypanosomes have regular linear chromosomes - If you look closely, they’re bacteria-like in the way they organize their genes - They’ve shoved their genes into massive polytranscriptional units - You can have dozens/hundreds/thousands of genes all on polycistronic units - The genes that make this organism work are found in these grey bits - There’s another region on another section of the nuclear chromosome that has identical monocistronic genes (yellow) o Not expressed in the long transcript; they’re individually expressed o We call these 5’ caps - Polycistronic units give long polycistronic transcripts with many genes on them - Monocistronic units give small units - Now we have the long red bits and the little yellow pieces - The little yellow pieces attack the long polycistronic transcript, and that produces our individual mature transcript o Occurs through trans-splicing - Essentially, the little yellow pieces of RNA snuggle up to the polycistronic piece, bind with it, causes a reaction, and they splice together - The genes are these long sausages - The 5’ caps invade the sausage string, producing the mature transcript o Called spliced-leader trans-splicing - Called trans because we have two distinct bits coming together - This is good because it occurs very quickly o Harder at the regulation level - Not super common, but it does occur in trypanosomes Oxytricha (ciliate) - This is not an alga, but it has eaten a bunch of algae/cyanobacteria - Has 2 different nuclei, 2 different nuclear genomes o Different from the nucleomorph example (chlorarachniophyte) - Oxytricha and chlorarachniophytes are found in two very different branches on the tree of life - Oxytricha is about 50 micrometres in length - Has macronucleus and micronucleus (big and small) – also called MAC and MIC o Think of the A in MAC as “active”, and the I in MIC as “inactive” - After Oxytricha undergoes meiosis, you get two copies of the MIC (normal), but the MAC gets degraded and one of the MICs gets turned into the MAC - The big mystery is that the two different nuclei don’t have evolutionary origins o Essentially have the same nucleus, exist in parallel in the cell - The MAC active nucleus (genes get expressed) is funny because instead of long chromosomes, you have shorter gene-sized chromosomes o Means there’s thousands of unique chromosomes; each chromosome is present hundreds of times (hugely polyploid) o Very little non-coding DNA - The MIC has “normal” chromosomes, but the coding information isn’t coherent o It’s all mixed around, almost like exons o Sometimes not even on the same strand of DNA o It’s also haploid, and there’s huge amounts of non-coding DNA - The MIC and MAC are almost complete opposites of each other - So we know how to turn the MIC into the MAC, we need something that will cut out the non-coding DNA, flip everything around the right way, and put it in the right order How many genes does an organism need? - Paramecium is a ciliate like oxytricha (so has a MAC and MIC as well) - The number of genes here usually represent nuclear genes only, not counting duplicates (so don’t include rRNA or tRNA) - In terms of living things, the lowest number of genes we can find is in the Mycoplasma - Do complex organisms need lots of genes? o YES: don’t underestimate the complexity of seemingly “simple” organisms - Does gene number scale with genome size? o Not really - Does non-coding DNA (intergenic/intronic DNA) have a function? o Crucial to the survival (“fitness”) of organism o Removing it causes major loss of fitness or death o In other words, the sequence is essential – so some non- coding DNA does have a function Regulatory elements - Non-coding DNA sequences that play an essential role in genome replication and/or gene expression o There are many regulatory elements in a genome - Regulatory elements usually quite small, ~10-100bp o Even if you have thousands of regulatory elements, they usually make up a small proportion of your genome - Most eukaryotic nuclear genomes made of non-coding DNA o Are there hidden regulatory functions to much of this non-coding DNA, or is it junk? - Most non-coding DNA not made of regulatory elements The Encode Project - Started in September 2003 by the US National Human Genome Research Institute, and it’s still ongoing today - Goal: to uncover all functional elements in the human genome, particularly those outside of genes - In 2000, there was lots of hype for lots and lots of non-coding DNA - 98% of the human nuclear genome is non-coding - recall: human nuclear genome is ~3 billion bp, ~19 000 genes Lecture 16: More on ENCODE & Genetic Modifications - So does non-coding DNA have a function or not? - HGP sequenced the human genome, and it also annotated all the genes - ENCODE’s goal was not to look at the genes, but to look at all the functional bits found outside the coding regions - They found a lot of regulatory bits: transcriptor binding sites, enhancers, promoters - The interesting finding was that 75% of the human genome is transcriptionally active o This means if you look at all cell types across all tissues and across your life, you’ll find that 75% of the genome is turned into RNA - This seems counterintuitive because only 2% of the human genome is coding - Found that 80% of the human genome is involved with at least one biochemical function o In short, 80% of the human genome is functional! o This goes against what we said previously about 90% of the genome being non-functional - There was a debate over the definition of the word “function” o ENCODE described “function” as: anything that’s transcribed - Genomes often have a lot of transcriptional noise: things get transcribed all the time, even if they don’t code for anything o Turns into RNA, then probably gets deteriorated or something o Genetics isn’t clean - So most of the 80% of “functional” genes are the random things that get transcribed, including non-coding DNA - Do most of these transcribed regions have a function? ENCODE argued that 80% have at least one function o BUT, recall that 100% of genes have a “reproductive” biochemical function – they replicate! - Let’s consider two plants that have different genome sizes - Imagine that we did the ENCODE project on these plants and we find that in both plants, 75% of the genome is transcriptionally active o This means 45 million bp of one genome is transcriptionally active, and 112.5 billion bp in the other genome o If we were the ENCODE people, we would be saying that these bp are crucial to function o Does that make sense? Why would one organism need so much more regulatory DNA than the other? - So how much of the human genome has a function? o Smith argues ~5%, could be as high as 25%, but the hype people say >50% - This debate about junk vs. functional DNA was good because it advanced the way we think about genetics - Found that about 5% of the human genome is methylated o Recall: 5% when you look across all tissues and cell types as a whole, not individually o Can’t just grab one cell and see that 5% of it is methylated DNA methylation - Methylation: when you add a methyl group to a cytosine base - Other bases can be methylated, but generally speaking, when you talk about DNA methylation, you’re talking mostly about cytosine - Methylation tends to happen at CpG sites o Think of it as C and G sites - Remember that CG is the 5’-3’ direction! o So in the opposite strand, you have GC - There are other sites of methylation as well, but we won’t be talking about that - Even if you shove a methyl onto the strands, you’re not changing the primary sequence of DNA How much CpG methylation in nuclear genomes? - Common most eukaryotes – lots of it! - ENCODE said that every CpG site in our genome is methylated – this is a really cool finding - This means that about 5% of the human genome is made up of CpG sites - Methylated or not, what’s the big deal? o Epigenetics: methylation impacts gene regulation and expression o Genomic methylation patterns can be heritable - Methylation impacts gene regulation & expression o When you methylate a genomic region, you essentially silence it – it’s an off switch to genes - If you only want to express certain regions of a gene, this can be a good thing - If you want TEs to stop jumping, you can use methylation - When you think of chromosomes, a lot of huge chunks of them can be silenced through methylation - Methylation is also bad because when methylation patterns go wonky (e.g. methylation is disappearing/appearing in the wrong place), it can lead to certain diseases o Can contribute to aging, mental health issues Genomic methylation patterns can be heritable - It’s never clear how many generations it will get passed down for - In some studies, they show that it goes on almost indefinitely; in others, the patterns are shown to change - Heritability is not unique to humans; it can apply to other organisms - When we see phenotypic differences, we’re trained to think that this difference is due to mutation and differences in the underlying DNA sequences - The trick with methylation, why it was so neat when it was discovered, is that you can have identical genomes in two different things, but very different phenotypes depending on their methylation pattern Environment can influence methylation patterns - Traditionally we think of phenotypic differences as mutations in gene sequence - Now we have to consider alterations such as methylation Origins of DNA methylation - The whole discussion above was talking about eukaryotic nuclear genomes - However, methylation is also present in bacteria and archaea - Enzymes that are involved in methylation are found in all branches of the tree of life Bacterial DNA methylation - Bacteria have restriction enzymes that chop up DNA at certain sequences - The bacteria wants to cut up foreign DNA only, not its domestic DNA - To prevent cutting of their own DNA, bacteria can methylate their DNA, which will allow only the foreign DNA to be cut by restriction enzymes - The idea of methylation falls under a bigger umbrella called genetic modifications Genetic modifications - Significant genetic alterations that are often not apparent or obvious given the primary DNA sequence alone - How do we decipher DNA? Through the universal genetic code - Appears that everything in this world uses the same universal genetic code o Must’ve evolved very early on and given rise to all the diverse branches in the tree of life - As time went on, we’ve discovered that the genetic code is not so universal – in many lineages, the code has changed o Typically called “alternative” or “nonstandard” genetic codes - Nonstandard genetic codes are often found in organelle genomes o Most popular to find in mitochondrial and chloroplast genomes - In vertebrates, these four codons don’t give you the same proteins as the standard genetic code would give you - If you apply the standard code to the mitochondrial genome of vertebrates, you would end up dying because you’d insert stop codons all over the place - This means that in our own cells, we have two different genetic codes being used: a standard code and a nonstandard code - This means we have two different types of tRNA (because tRNA is responsible for deciphering codons and bringing in their corresponding amino acids) - In nematode (C. elegans, starfish, lepidodinium) mitochondrial genomes, the nonstandard code is not the same as the one found in vertebrate mitochondrial genomes o Even if there’s one change, you still can’t apply the standard code (e.g. in the chloroplast genome of the dinoflagellate Lepidodinium, only the AUA code for isoleucine is changed to methionine) - Yeasts have changed the code, and on top of that it’s also forsaken some codons o If you look at the mitochondrial genome of yeasts, you won’t find the CGA or CGC sites in protein coding DNA o You might find it in non-coding regions, but it will never code for anything - Diplonema has taken these stop codons and turned them into amino acid coding sites - This is a case where you have two different nonstandard codes in two different genetic compartments in a single cell - How does lateral gene transfer happen here then? o The difference in codes serves as a big barrier that prevents endosymbiotic gene transfer o Most evolutionary transfer happened before the new nonstandard codes evolved - In many cases, you can have the same gene expressed in different ways, and that have very different genetic codes - The goal of this is for us to think about the fact that each gene pictured here gives us the same protein o In different systems, the ways that this protein is expressed is dramatically different - Nonstandard genetic codes have evolved dozens of times independently throughout the tree of life o Organelle genomes in particular are hotspots for nonstandard codes Lecture 17: Mechanisms of Genome Evolution - What do we talk about when we talk about genome evolution? o Differences in genome architecture: size, structure, content, modifications ▪ There’s huge diversity in genome architecture: within cells, within lineages, and between different lineages – how do we explain this diversity? o Forces responsible for differences - Think about genome evolution at a molecular level: the level at which mutations occur in genes - The only way to get different genome architecture is through mutation o Different levels of mutations: type, context, frequency, bias 1) Type - Includes point mutations: T-A site changes to C-G site - Deletion, insertion, large insertions/deletions, duplications, rearrangements, fragmentation, fusion, conversion (one sequence basically copies itself onto another sequence) o Large insertions can occur through HGT, mobile elements, endosymbiotic gene transfer 2) Context: where is it occurring? Is it occurring in non-coding DNA? If so, is it regulatory? - Is it happening to a region, or to the whole genome? - Is it impacting the whole cell? 3) Frequency: does it happen often or rarely? - Often you get a system where you have many point mutations, and very few fragmentation events, or vice versa 4) Bias - One type of point mutation could be more frequent than another type – mutational spectrum Mutations alter genomes - Mutations are a reflection of the environment the cell lives in - The actual cellular machinery that we have could alter genomes as well - Some organisms have really good DNA maintenance machinery – this is pretty rare o This means you hardly get any mutations (e.g. plant mitochondrial genomes) - Sometimes you have a really crappy DNA maintenance machinery – more common o Doesn’t work well, always inserting mutations (e.g. animal mitochondrial genomes) Thinking about evolution - Evolution is a population-level process - The little white things in the image are algae - Let’s add a mutation into the population, so we’ve changed the genome of one of these organisms - When this happens, we could get one of two outcomes - Over time, this change will either get fixed in the population, or lost - What determines whether a mutation will get fixed or lost? - First question: is the mutation beneficial, deleterious, or neutral? o If beneficial, you’d want to have it fixed - Next question: is this population effectively large or small? o “effectively” because you can have huge populations that behave like small populations o what determines this is the probability that a member can pass on its genes to its offspring o If population is selectively large, natural selection is efficient - If you have many competitors, a little mutation that’s beneficial can give you an advantage - In tiny populations, natural selection sucks o E.g. remember the woolly mammoth example, where the island population accumulated many deleterious mutations o You get a lot of genetic drift o Neutral and slightly deleterious mutations have a chance of getting fixed - Mutations come and alter genome architecture, and then population size and random drift and natural selection determine whether that mutated gene is fixed or lost Hypotheses on genome evolution - Hypotheses fall under two categories: adaptive and non-adaptive 1) Adaptive: most of the genomes we’re observing had mutations that were fixed through natural selection 2) Non-adaptive: most of the weird genome architecture is caused by mutations that were fixed through genetic drift - When we observe genomes, we have to ask ourselves: is this a result of genetic drift or natural selection? - Recall: genetic drift is variation in the relative frequency of different genotypes in a small population, owing to the chance disappearance of particular genes as individuals die or do not reproduce - Here, we see that one mutation linearized the system, another mutation fragmented the system Adaptive: Skeletal DNA hypothesis - Hypothesis says that as genome size goes up, so does nuclear volume and cell size - Mutations that cause this increase/decrease in genome size are being through natural selection - Natural selection thinks having a bigger cell is good, or vice versa Non-adaptive: “Selfish” DNA hypothesis - Mobile elements aren’t necessarily providing any benefit to the organism - Result of genetic drift Mutational hazard hypothesis - Buddhist philosophy: the more possessions you have, the greater the chance of losing those possessions - We can think of this on a genomic level o If you’re a genome and you start accumulating baggage (e.g. introns, non-coding DNA), this essentially increases the mutational target o The more stuff you have, the greater the chance you have of being hit with a mutation - The higher the mutation rate, the greater the burden of carrying all this excess baggage - If the mutation rate is really high, the excess baggage is a liability to you because there’s a good chance you’re going to get hit with a mutation - The higher the mutation rate, the more you want to stay compact - If the burden of carrying excess DNA is really high, what kind of populations would be good at purging that excess DNA? o The higher the population size, the more efficient natural selection is, and the better it is at purging excess DNA - Lynch’s hypothesis says that when mutation rate is high and population size is high, you get really streamlined genomes o Burden of carrying excess stuff is high, and the genome is really good at purging the excess Summary - What do we talk about when we talk about genome evolution? Genome architecture - What do we know about genome architectural diversity? It’s hugely diverse - On what levels do we think about genome evolution? Molecular and population - How can we explain the diversity in non-coding DNA content? Using adaptive/non-adaptive hypotheses Lecture 18: More on Modifications & Some Bioinformatics Genetic modifications: Significant genetic alterations that are often not apparent or obvious given the primary DNA sequence alone Non-standard codes - The amino acid sequence was derived using non-standard code - If we didn’t know that the mitochondrial genome for cox1 used a non-standard code, something very different would’ve been derived – we would’ve predicted the wrong product RNA editing - You can have the DNA sequence, but if you don’t understand the modifications (editing), you’d be lost - We expect that if there’s a gene sequence, it gets transcribed, gives you an mRNA - The DNA sequence looks exactly like the mRNA (identical) o But this isn’t always true; can be completely wrong Example: Spikemoss (Selaginella) - Let’s look at the mitochondrial genome for this organism, call it gene A - Here, the Ts all became Us (not that weird), but something else came along and changed many of the Cs to Us as well o Call this post-transcriptional editing (RNA editing) - By changing Cs to Us, this has a huge impact on the amino acids that this gene encodes - Introduces a stop codon, changes all the amino acids - If we look at the cox1 gene for Selaginella, we see that in the RNA sequence, most Cs get turned to Us, but not all of them o So a huge chunk of this gene has been edited (>200 C-U editing sites) - So if someone gave you the DNA sequence from this gene, we can’t really say anything about the gene - In this way, the DNA doesn’t reflect the RNA, and vice versa - When genes are translated, it uses the standard code - If we tried to translate the cox1 sequence using DNA, we’d get a different sequence than the RNA sequence - For every C that gets edited to U at the RNA level, it involves proteins that bind to the RNA sequence o Protein complex called the edisome - Protein complex binds to the RNA and edits the C to the U - The same complex can’t edit every C - You need a different protein complex for every editing site! o If you have 100 Cs that need to be edited, you have 100 protein complexes - Land plants and green algae have various degrees of RNA editing in mitochondrial and chloroplast genomes - RNA editing has evolved many times independently throughout the tree of life Trypanosomes - Recall their mitochondrial DNA (kinetoplast DNA), with the genes found on maxi circles - In this case, we have RNA editing, but instead of changing a C to a U, we’re inserting/deleting Us from the system o Not substitutional editing; it’s insertion/deletion editing - Hundreds of U’s can be inserted and deleted from a single gene - This is even more extreme than the C-U editing in the previous example o You wouldn’t even be able to tel what gene this was just by looking at the DNA - So how does this happen? The mini-circles give tiny transcripts that are much shorter than gene-length transcripts - Gene transcripts are the ones that get edited, but the tiny transcripts guide editing - The tiny transcripts bind mRNA, and it allows the protein complex to know where to insert and delete U’s Diplonema - Mitochondrial genome found on more than one chromosome – fragmented - To get your large ribosomal subunit, you need to transcribe the genes on two chromosomes - A machine comes along and adds all kinds of U’s to one transcript, and a bunch of polyA tails to the other transcript o They’re then spliced together - The 25 added U’s are then found in the functional rRNA - Recall: in cases where RNA is fragmented, they never got spliced together o In this case, they’re actually spliced together into a single piece of RNA o In this single piece, you have 25 U’s sticking out and staying inside the functional RNA o This is a type of RNA editing because you’re inserting U’s - Does it make sense to want to tweak your DNA sequence? How could genetic modifications impact: - Endosymbiotic gene transfer? o Does this create a barrier towards endosymbiosis? - Lateral gene transfer? o If you want to move genes across different mitochondrial genomes, how would modifications impact this? - The outcome/consequence of DNA mutations? o If the C got mutated to a U at the DNA level, would the C-U modification even make an impact? o Editing could even act as a mutational buffering process - The “fitness” of an organism? Bioinformatics Bioinformatics: the science of using computational methods to decipher the biological meaning of information contained within organismal systems. - Command-line driven software keeps bioinformatics in the hands of experts - User-friendly GUI brings bioinformatics to the masses - Publish our findings in online sequence repositories such as GenBank, DDJB, and EBI Lecture 19: Bioinformatics & The Future of Genetics 2 main topics in bioinformatics: assembling and searching - Currently, it’s very easy to generate sequence data - Hard to put DNA pieces together from a sequencing machine Sequencing reads - Look for overlap between different reads, then you assemble them into contigs - Here, the contig is a total of 275 nucleotides long - Could this similarity be a result of chance? - The chance that one random site is identical to another is 25% (4 bases) - The chance that two sites in a row are identical is 0.25 x 0.25 = 0.0625 - Chance that 25 sites are identical = 0.25 25 - Problem: in this genomes, it’s very common to have repeat elements - Imagine you have an identical repeat in this read, and it’s present 4 times o You could overlap it in many places and still think it’s identical - If you’re lucky, there will be another sequence that covers the entire repeat sequence - In this case, we know that there’s 8 copies of the repeat, so we know where the sequences fit together - Sometimes these repeats go on for very long stretches and you’ll never have a read with the repeat contained in it - Brand-new technologies can give us longer reads, but this is very very recent (last 12 months) - Today’s bioinformatics programs can assemble millions of sequencing reads in hours to days to weeks, depending on the algorithm and computer power Assembly algorithms - We’re able to assemble sequences because there are very efficient algorithms that are good at comparing reads with each other - Arms race between computer technologies and sequencing/assembly technologies - Hugely dependent on the computing infrastructure – more powerful computers = better assemblies Search: BLAST - Search methods are called BLAST - You can search blastn (nucleotide sequence search against database of known nucleotide sequence) - Tblastx is when you translate your nucleotide sequence and search it against a database of translated nucleotide sequences - Blastp is searching a protein sequence against a database of protein sequences tblastX - Take your unknown sequence and put it all into 6 frames – end up with 6 nucleotide sequences - Search each sequence against a database where you’ve done the same thing for everything in the database - Obviously more computationally exhaustive because we’ve turned each sequence into 6 sequences - You’ve amplified everything by 6 blastp - Compare an amino acid sequence against an amino acid database Let’s say we have a 1000bp long sequence - We can put it into tblastx, and we get a hit in GenBank - We will get a bunch of matches - Coverage is high: the 1000 nucleotides we searched are covered completely by sequences in the database - Score is high: in BLAST, there are various scores - We can probably agree that the unknown is probably a cox1 gene from an ape Let’s say we have a new unknown, and we conduct tblastx, get a hit - Now the hits don’t look so good - Coverage is low: search sequence didn’t get covered that much in the database - Score is low: lots of mismatches - Hits seem to be all over the place - I would say this is probably meaningless – don’t know what the unknown is Let’s do a search with blastn - Everything looks good! - The type of BLAST you use can affect the type of hits you get - When we took the unknown and used tblastx, we didn’t get hits because rRNA genes aren’t conserved at the amino acid level (not proteins) o We needed to search using a search engine at the nucleotide level - Usually people just do both - Many sequences will never have hits in the database Imagine you’re working with your own mitochondrial genome, and we use tblastx - What does the human mitochondrial genome have that’s special? Uses a non-standard genetic code! So we have to tell BLAST to use the non-standard code Let’s say we’re BLASTing the mitochondrial genome of a trypanosome – would you have better results with DNA or RNA? - Using the final RNA product is better because this is where all the Us are inserted/deleted - This is where you’ll get hits in the database Lecture 20: Synthetic Biology: Parts, Modules, and Systems - Regulation: transcriptional, translational, post-translational - Sensing: chemical, light, force - Communication: small molecules, proteins, viruses - Physical: motility, growth, transport Synthetic Biology Definition: Connect various elements (regulatory, sensory etc.) in a reliable, predictable way and program a cell to function as a system - In synthetic biology, we’re trying to avoid unpredictable mutations - Goal: to have a predictable end product Hierarchy and modular organization - Even the smallest mutation would result in a sensible phenotypic change - We can take advantage of the fact that everything is hierarchically organized - If we introduce this hierarchy into a synthetic organism, we’d have to start with our first physical layer (analogous to proteins and genes) o Take these individual parts and create gates (if I press switch A or B, this is what should happen…) that can control what each part can do ▪ Equivalent to biochemical reactions in cells We can apply engineering principles to the design and alteration of natural systems or de novo construction of artificial biological devices and systems that exhibit predictable behaviours Synthetic biology circuits - To achieve hierarchy, look at all the individual proteins, mRNAs, and microRNAs o All of these respond to external stimuli, sensing signals from the outside - If we can organize these parts as a module, we’ll end up with a regulatory circuit Let’s say microRNA A regulates expression of gene B, and gene B regulates expression of another gene - We can define how we want this circuit to happen Synthetic biology: start with parts, make the cell react to stimulus and observe the effects - We have to ligate different strands of DNA together The Design Cycle - The first 3 processes start with a computer - Construction of the actual plasmid - Probing, testing, and validation come last Tools for design cycle 1. Engineering principles for design – decoupling, abstraction, standardization a. Abstraction: how do I stitch pieces together to form a good device? 2. Components for parts selection – Cis-elements, promoters, exons, protein-domains, ORFs, terminators, initiation sites (biobricks/phytobricks) a. Challenges: transcriptional regulation and precise control of expression in synthetic circuits b. You have to select parts which are already available c. We already know which parts we’ll need 3. Computational tools for design and modelling a. Component design & synthesis b. Topology and network design c. Behaviour prediction and simulation The Lac Operon Concept - Genes are organized in operons in prokaryotic cells - Regulator gene competes with operator site - If regulatory gene is expressed, it competes for polymerase binding site o No lactose present in this situation - If lactose present, regulator gene can’t bind to operator site because lactose prevents it - The polymerase easily binds to operator site, and downstream genes are expressed Logic gates - start with conceptualizing - here, you want your system to react to water (environmental stimulus), age and temperature - You want your system to react in a predictable way, following a particular circuit - So you organize your genes in a way that will follow a set of reactions and produce protein A based on the conditions you’re trying to produce A for - You can organize genes and create devices and define what these devices produce based on your initial stimulus - From DNA POV, you'd see your cis-elements that will express a downstream gene, which potentially acts as maybe a repressor for another gene downstream o This happens until you get your end product based on your stimulus Parts: The Repressilator - One classic part for synthetic biology is the repressilator - How it works: essentially a plasmid that's constructed with parts in the form of a device - Genes that constitute these parts are essentially repressors of downstream genes - Genes repress each other in an oscillating manner - If R2 present, it binds to P1, preventing expression What other kinds of circuits can be built? - Various sensors: light, dark, heat, cold - More switches, logic gates – more repressors, activators - Parts for intracellular communication o Helpful if cells could tell each what condition they’re in (quorum sensing) - Parts for signalling the output of circuits o Fluorescent and luminescent proteins Example of a synthetic system - Compound A activates a gene that activates a downstream gene that then produces an AHL compound - Based on the amount of AHL produced, we're asking downstream genes to repress/activate a few more downstream genes, introducing our signalling chromophore (which we can track) - Over time, amount of AHL depletes - With lower AHL levels, you're only activating part of the system but not the entire system o Now you will see a signal - But if there's no AHL present, nothing is activated - You're designing your logic gate in a way that's based on your external stimulus - If AHL produced, you will not see GFP signal - If little AHL produced, you will see GFP signal - If no AHL at all, you will not see signal - There's only signal if there's a certain concentration of AHL present What makes an efficient part? - Should not have multiple functions o E.g. one subunit of T7 phage DNA polymerase is actually E. coli thioredoxin - Each part should perform a specific function - If you're choosing T7 bacteriophage, it can act as a DNA polymerase, but in a different system it can also act as a thioredoxin o Make sure you're only using the part for its intended purpose - Different parts should be compatible - Parts should work in different contexts Next step: constructing a minimal cell with a minimal genome - Limiting factor: knowledge about the minimum gene-set required for minimum genome Lecture 21: Transposable Elements and Variations in Chromosome Numbers Transposable elements = jumping genes - Discovered by Barbara McClintock in maize Classification 1. Retrotransposons – Retroposons a. Transpose via reverse transcription of an RNA intermediate 2. Transposons a. Transpose DNA directly without an RNA intermediate - TE’s come with their own moving information – self-perpetuating - Almost 45% of the human genome is made of TE’s LINEs: long interspersed elements, SINEs: short interspersed elements - LINEs are longer, SINEs are shorter Copia from Drosophila - Polytene chromosomes - In situ hybridization with a probe for copia sequences - Two different strains from different geographic locations - 30-50 copies each – but often in different positions - position of TE has shifted relative to genes - Polytene are in the giant glands of Drosophila - We can use these for in situ hybridization - Here they used a probe against copia sequences - Same number of copies, but in different positions - Different geographic locations = not physically close - There is lots of time for this TE to jump around - We can use TE's to estimate how closely related particular individuals are that come from the same species o If quite different, they probably haven't seen each other for a considerable amount of time Retrotransposons Two types: non-LTRs and LTRs (long terminal repeats) - On the top is non-LTRs o Classical characteristic is that they have a poly-A tail at the end of their coding sequence o Look very similar to mRNA because they have similar elements o They have reverse transcriptase gene in them that allows them to jump - Other class is LTR: also has reverse transcriptase gene o Surrounded by long terminal repeats (LTRs) o Depending on the TE, the jumping conditions are different - Reverse transcriptase means i
More Less

Related notes for Biology 2581B

Log In


Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.