Study Guides (238,295)
Canada (115,061)
Biology (1,419)

Biology 2581B Final: Genetics 2581 Final Notes

87 Pages
Unlock Document

Western University
Biology 2581B
David R Smith

Lecture 13 ● Chromosomal rearrangements can occur on a large and small scale and include: ○ Deletions ○ Insertions ○ Invesions ○ Duplications ○ Translocations ○ Reciprocal translocation ○ Whole genome duplication ● Mouse versus human genome - at the sequence level they’re very similar and they have the same types of genes and their genes have similar sequences ○ However, at the chromosome level, when you compared the staining patterns of the chromosomes, there was no conservation suggested ■ But the mouse chromosome could be pieced together from putting together different human ones ● Syntenic Segments: ○ Identity of genes ○ Order of genes ○ Orientation of transcription → almost the same: so basically they have the same types of genes, order and even their order of transcription is relatively the same, but they’re in different locations ● You can take different genomes from grass and then line up the chromosomes as circles and you can see how they align - there are very few gaps ○ Can co-align them but there also has to be rearrangements occurring ○ Those occur over long stretches of time and also, only the ones that are not lethal are represented since only those rearrangements are sustainable ■ They have different time frames and different restrictions ● Collinearity ● Rearrangements can occur all the time and they’re not always deleterious since they actually lead to the generation of antibodies ● Many antibodies generated ● Mutations that occur on chromosome 14 and they’re involved in B cell development ● In those rearrangements, you can bring segments together but in doing so, you would lose the segments in the middle ● If there is a lack of diversity in the making of these antibodies, you can have a shift in the formation of T and B cell clones ○ Observed in lymphomas ○ You’re not generating the variants you need - making a diverse array of molecules is critical for a person’s health Deletions: ● Loss of sequences ● Small deletions affect a single gene ● Large deletions lead to the loss of 10s or 100s of genes ● Can be caused by X-ray or other chromosomal damaging agents that break the DNA backbone (X-rays break both strands of DNA) ○ Need two breaks and then you can lose the intermediate material ● Most deletions are lethal but there are exceptions ● Wolf-Hirschhorn Syndrome ○ Deletion from the short arm of chr4 ● Cri du Chat syndrome ○ Deletion from the short arm of chr5 ● Humans cannot survive if more than 3% of their genome is deleted ● Most deletions are also dosage dependent - more or less of a particular product is important ○ Some genes need to have the correct dose in order to have no associated phenotype ■ Eg. Drosophila: a deletion in the notch gene leads to notches in the wing and even heterozygotes have some sort of phenotypic effect on the wing pattern, but they can still fly ● No recombination can occur within a deletion loop since the genes in the loop cannot be separated → genetic distance between loci on either side will be underestimated ○ If you’re heterozygous and there is a deletion, then you cannot align between the two homologues since you’re sticking out like a loop ○ So even if you try to use recombination to figure out the distances between the loci, you cannot use a recombination event if there is a deletion loop ■ If you do mapping events, your estimates for the mapping distance will be wrong Detection of Deletions through PCR: ● If you have sequence information, you can use PCR primers upstream and downstream of the deletion and you have a wild type PCR product and a deletion PCR product ○ By running the sample on the gel, you can look at the size differences ○ Can easily track if an individual is homozygous or heterozygous of alleles Detection of Deletions through Karyotyping: ● For large deletions - can use dyes to see which part of the chromosomes are missing Using Deletions to Map Mutations ● Drosophila ○ Polytene chromosomes in their salivary glands ○ Giant chromosomes ■ You make a huge array of chromosomes and the homologues stay together and they make giant chromosomes ○ Undergo 10 rounds of replication but no mitosis → each chromosome consists of 2​ = 1024 double helices ○ So you’re increasing the number of chromosomes but they’re not separated into different cells ○ You can easily stain and visualize the chromosomes ○ Polytene chromosomes are oversized chromosomes which have developed from standard chromosomes and are commonly found in the salivary glands of Drosophila melanogaster ○ Similar sequences are close together and they will intercalate with the dye in a similar manner ○ You can see the deletion loops when you stain with these dyes ○ Some of these deletions are so big you can see which bands are affected and which loci ○ There are deletions that lead to changes in eye morphology ○ These loci were known to be relatively close together on the x chromosome but we’re not sure how far apart they are so we used deletions to map ○ Polytene chromosomes collect mutants and deletions and they can be identified very easily → can carry a lot of lines that carry different types of deletions in that particular area of that chromosome and you can collect and make a huge database and figure out what pieces are missing (which band patterns) ■ And then if you have an idea of where it is, you can order different deletions strains and then you’ll know which band patterns are missing ■ A deletion cannot be where you still have a wild type phenotype and you can use deletion mutants to compare and by doing so, you can look for phenotypes and decide which areas correlate to particular mutations Seb’s notes for this slide: ● All 3 loci are in close proximity to each other on the X chromosome in drosophila ● In order to use deletion strains to perform mapping, first collect strains of each mutant ● There is a large database that characterizes different types of mutants showing exactly what piece of the chromosome is missing in that particular deletion strain ● The deletion is indiatedby the red box and you can see which banding patterns are missing in the mutant strains ● Since you have an idea as to where the deletions are, yuo can order the deletion strains which all have a deletion in the same region ● You then grow these strains and look at the their morphological differences and search the strain for the mutation of interest ● Start with white: there are two strains that are WT to the white mutation so it is not found there ○ Next look at where the mutation is visible, which is the other three strains ○ The gene must be in the smallest portion which overlaps between the strains - it is clear that it is the area that is highlighted in green Duplications ● If they’re in the same order, they’re direct repeats ● If they’re in reverse order, they’re inverted repeats ● Tandem duplications: close together ● Nontandem (dispersed) duplications: far apart Cause for Duplications ● X-ray breaking the chromosome in two places and then breaking a homologous chromosome in one place so it can move into the other chromosome ● Duplication loops can be observed in polytene chromosomes since the duplicated part cannot line up so it forms a loop ● Either one of the copies can form a loop or they can both loop out in a double loop ● These loops can be detected by karyotyping ● There are often no obvious phenotypic consequences of duplication, but in humans, heterozygosity for duplications covering more than 5% of the haploid genome is most often lethal ○ Duplications increase copy number of a particular gene and can alter expression by placing genes in a different chromosomal environment ○ Duplications are key in cases where genes are dosage dependent ■ Eg. Drosophila: three copies of the Notch gene can lead to aberrant wing veins (while one copy led to notches in the wings) → Strangely enough, if you have a duplication on one chromosome but then a deletion on the other, you still have 2 copies of the gene so you have the wild type phenotype * so for dosage dependent genes, only the copy number matters * Duplications and Unequal Crossovers ● Recombination can lead to an increase of copy number on one chromosome but a decrease on the other ● Drosophila eye development - double bar mutation → recombination event that is not lining up the chromosomes properly ○ So you have one copy on a chromosome ○ The bar mutation is two copies ○ The double bar mutation is three copies Inversions ● Half circle rotations of sequences - occur when you have repeat sequences ○ So you have these repeat sequences that align and then a recombination event occurs that leads to the inversion ● Pericentric inversions - inside the centromere ● Paracentric inversions - outside the centromere * follow the green line to see how the recombination occurs * ● Most inversions do not result in an abnormal phenotype but they do rearrange the order of the genes (not changing the sequence, just the orientation) ● Usually do not affect the function of the genes and they do not add or delete sequences ○ Unless the breakpoint is in the middle of a gene → disruption of a gene can lead to a gene knockout (null mutation) ● Inversions are reverse complements (you can’t just invert it since the 3' and 5' ends are incorrect and it will not fit into the DNA) ● Inversion loops occur in individuals which are heterozygous for the inversion when homologs align during meiosis ● The loops occur because it is the same sequence, just in a different orientation so the chromosome has to twist in order to align ● Can occur in polytene chromosomes Inversions and Recombination ● Form an inversion loop when attempting to align ● You have crossovers and you are losing some of the gene information ● Outcome: normal chromosome and inverted chromosome ● The recombination event is that you switch the arrows between the chromosomes and you actually lose a section ● In a paracentric inversion you can have chromosomes that have more than one centromere and ones that don’t have any at all (more than one leads to chromosomal breaks) ○ Leaves you with one wild type, one inversion and depending on the breakage point, you can get rid of multiple centromeres Translocations ● Things are moving from one chromosome to another ● May not have any phenotypic events since you’re not losing anything in the movement, but it depends on the breaking points - if the breakage is in the middle of a gene, it will cause a knockout, or will have other phenotypic changes ○ Or the change of chromosomal environment can impact the regulation of gene expression ● You can design a primer that recognizes one of these genes on the break points and the single primer can bind to the blue sequence and then one of the bind to the red ○ You can only detect those with break points since you won’t be able to amplify the PCR product with only one primer binding Textbook notes on Figure 9.20:​ How a reciprocal translocation helps cause one kind of leukaemia. A reciprocal translocation between chromosomes 9 and 22 contributes to chronic myelogenous leukaemia. This rearrangement makes an abnormal hybrid gene composed of part of the cabl gene and part of the bcr gene. The hybrid gene encodes an abnormal fused protein that disrupts controls on cell division. Black arrows indicate PCR primers that will generate a PCR product only from DNA containing the hybrid gene. To confirm a diagnosis of myelogenous leukaemia, for example, they first obtain a blood sample from the patient, and they then use a pair of PCR primers derived from opposite sides of the breakpoint—one synthesized from the appropriate part of chromosome 22, the other from chromosome 9—to carry out a PCR on DNA from the blood cells. The PCR will amplify the region between the primers only if the DNA sample contains the translocation Lecture 14 ● What is a gene? ○ The basic unit of biological information ○ A segment of DNA encoding a protein ○ A discrete region of a chromosome encoding a protein or RNA ● Gloeomargarita - a new species of cyanobacteria that they discovered ○ Closest relative to the chloroplast ○ Found in fresh water ● Eukaryotic photosynthesis evolved on land and in a pond Humans ● The human nuclear genome is 3 billion bp long and the mitochondrial genome is 16kb ● Cox1 - 1542 base pairs of DNA, 1542 nucleotides of ss-mRNA ● A gene on a mitochondrial chromosome so that gene is made up of double stranded DNA ● You transcribe that gene and get a single stranded piece of RNA and it is the same length as the gene of interest ● You then translate that using a ribosome and you get a protein that is 513 amino acids (1542/3 and then subtract one from the stop codon) ● That protein is called Cytochrome C oxidase 1 ● How many amino acids are found in a piece of mRNA? ○ You have to account for the stop codon ● The gene gets you a protein, that protein is found in the cytochrome C oxidase complex (ETC) ● Central dogma of genetics: DNA → RNA → protein ● The mRNA sequence is identical to the coding strand minus the fact that the Ts are replaced by Us ● That mRNA has the start codon AUG ● Stop codons that could be UAA or any of the other ones ● The actual coding sequence is broken down into triplets ● When you think of codons that code for the amino acids you think of mRNA ● mRNA is usually thought of in terms of U and not T ○ But there is a difference between T and U ● Part of a polycistronic transcript (describing a type of messenger RNA that can encode more than one polypeptide separately within the same RNA molecule; bacterial messenger RNA is generally polycistronic) ● When the gene is compressed in your mitochondria, it is part of the polycistronic transcript and that is processed into smaller mRNAs ● They're not all protein coding transcripts ○ tRNA, rRNA, etc ● Overall, COX1 is intact and continuous, part of a polycistronic transcript and encodes a protein ● The coding sequence is not broken up in any way ● You stick with the same gene COX1 and you look at it in different cistrons ● Wants to show us how different genes can be, even when you are dealing with the same gene Mushrooms ● The COX1 gene in mushrooms is longer than ours ● So why is that gene so much bigger than our copy? ● In mushrooms, that gene is full of introns - 19 introns, 20 exons ● In the nucleus, you have introns that have to get spliced out by this complicated machinery called the spliceosome ● In these organelle systems, the introns are a bit different so what happens is, when you get your mRNA and it has all these introns and each one folds into a complicated secondary structure ● That secondary structure is what allows the intron to cell splice Diplonema ○ Most abundant predator on earth ○ Its mitochondrial genome is broken up into circles ○ And the COX1 gene is found on 9 different chromosomes so it is fragmented ○ All these chromosomes contain 1 exon each ○ When you want to express COX1, you need to transcribe each one of these exons ○ And then you have to splice them together in order and make an intact transcript ■ Mature transcript ○ And then you can make your protein ● So ultimately you have these distinct exons that have to joined together - trans-splicing ○ Trans-splicing: exons located in distant regions or even on different strands or chromosomes are transcribed separately and then joined together ● Each one of those exons has a fragment of an intron ● When you have two exons floating around, each with a little piece of an intron, the two introns can meet and fold into a secondary structure and as the intron gets spliced out, it links the pieces together ● What is mediating the trans-splicing would be the introns ● Same gene, different organism Perkinsus ● Infects oysters ● In a normal gene, you have a start codon, a stop codon and everything in between should be in frame ● There shouldn't be any mutations that shift the frame without it being potentially lethal ● As they moved along this guys sequence, they found a lot of frameshift mutations ● How can this work? You should not get a functional protein if your coding sequence in not in frame ● The frameshifts were all happening at certain motifs ● Either they were happing at AGGY (the Y is a symbol for T or C) or at CCCCU and it turns out that the ribosome cruises along the mRNA, reading the triplets and when it hits the frameshift, it knows to jump either 1 or 2 nucleotides, restoring the frame as the ribosome reads it (but not within the sequence) ● At the mRNA level, it looks like coding sequence that is in frame and what looks be noncoding DNA and then more coding sequence ● If you saw it, you would think they're introns - pieces of non coding DNA interrupting the coding, except these sequences didn't look like intron and they never got spliced out of mRNA --> they were always there ● As the ribosome moved along the coding sequenced and it approaches the non coding DNA, the non coding DNA forms a stem loop and what happens is that the Ribosome flys off and then comes back at the next coding section Magnusiomyces capitatus ● COX1 in these amoeba is broken into two pieces and both of them give you mRNA and each one of those gives you a polypeptide and the two polypeptides come together to form a functional protein - essentially a fragmented protein, made up of two distinct amino acid sequences Dictysotelium discodieum ● COX1a is encoded in the mitochondrial genome but the COX1b is encoded in the nuclear genome and post transcriptionally targeted back into the mitochondria ● So a chunk of it got moved through endosymbiotic gene transfer David’s definition of a gene: a continuous, discontinuous or fragmented nucleotide sequence encoding a biologically ​functional ​molecule, such as a protein, tRNA, rRNA, etc. ● What is a functional piece of DNA? Who tf knows ● Sometimes genes are so tightknit that they overlap one another so where one gene ends with a stop codon, one gene starts with that so the start codon is found in its stop ● Sometimes gene even overlap by dozens of nucleotides ● Genes can be packed together tightly but they can also be all by themselves ● Gene deserts - empty chromosomes (eg. Like in the cucumbers) ** he said look at heterochromatin in textbook ** Heterochromatin: ● Google definition: chromosome material of different density from normal (usually greater), in which the activity of the genes is modified or suppressed ● Most of the heterochromatin (dark staining) in highly condensed chromosomes is found in regions flanking the centromere, but they can form in other regions ○ In Drosophila, the entire Y chromosome, and in humans, most of the Y chromosome, is heterochromatic ● Constitutive heterochromatin: chromosomal regions that remain condensed ○ This is in contrast to facultative heterochromatin—regions of chromosomes (or even whole chromosomes) that are heterochromatic in some cells and euchromatic in other cells of the same organism ● Active genes (i.e., genes producing RNA copies that will eventually be used to produce a specific protein) are present almost exclusively in regions of euchromatin ○ Heterochromatin appears to be inactive for the most part, probably because it is so tightly packaged that the enzymes required for the production of RNA cannot access the correct DNA sequences ○ The formation of Barr bodies in mammalian females illustrates the correlation between heterochromatin formation and a loss of gene activity ● Also the genes aren't always in the same strand ● They can get transcribed from different strands ● The actual mRNA is getting copied from different strands of DNA ● Selaginella has Gene A and in Gene A there is an intron and in that intron is Gene B ● Euglena has twintrons in its chloroplast genome ● So it has introns within introns ● The intron that is on the inside can fold into a secondary structure and get spliced out ● And it allows the outer intron to come together, fold and then splice out ● Two step reaction ● This can also happen with three introns - tritrons ● All of the genes we have talked about are protein coding E. coli - SSU rRNA ● rRNA is one piece, single stranded with a 5' to 3' end ● It is intact - should usually be ● In a few branches in the tree of life, not only has the gene become fragmented, but it has become scrambled ● The pieces are not in the proper order anymore Plasmodium ● In the malaria parasite plasmodium, its mtDNA genes for the small and large subunit have been broken and scrambled into 27 pieces (LSU and SSU rRNAs are 27 pieces) ● You transcribe each one piece and you can rebuild your rRNA from those little pieces ● The pieces are not spliced together to get an intact piece ● The only thing holding those pieces together is the secondary structure - the pairing Lecture 15 ● Compared these two genomes ● One came from the mainland of Siberia while the more recent one came from wrangle island ● The mainland one is near a larger population while the one on the island has a smaller population due to the geographical constraints ● Most of the bad mutations get purged out in a large population since there are many predators ○ Its amplified how much you suck due to that mutation since there are all these people that are better than you ● In a smaller population, as long as these mutations don’t kill you, it is not that bad ● When they sequences the too genomes, the one from the larger, mainland population had normal looking genes ● The other one had a bunch of mutations so the small population actually led to all these deleterious mutations ● This is potentially what caused the extinction of the woolly mammoth (a species that has gone extinct due to a genomic breakdown) ○ They had silky hair and weren't having sex so it led to a genomic breakdown ● Gene expression can be very diverse and unconventional ○ Ribosomal slippage and translational bypassing ○ In the malaria, the only thing holding it together is the pairing and if you denatured it, you would get all these 3' and 5' ends rRNA ● Human nuclear chromosome and you can zoom into a region that encodes the large and small subunit of the ribosomal gene ● Genes are repeated tandemly ● Looked at how these genes can be very different ● Intact and tandemly repeated (head to tail concatemer) ● This occurs on 5 different chromosomes ● Gives us thousands of copies of these genes and this is because rRNAs are very crucial ● A really efficient way of making those genes Euglena: ● Long, linear chromosomes with telomeres, etc. ● We don't find any rRNA genes on these chromosomes though ● The rRNA genes are stored on plasmid like chromosomes - small, circular and only contain the rRNA genes ● The small subunit is intact but the large subunit has been fragmented in 14 pieces ● So Euglena just makes like 600 copies of the plasmid ● The nuclear genome is haploid but the plasmid is hugely polyploid ● By doing this, you can make lots of rRNA Boring Expression ● Always seem to be saying that things are monocistronic but that is not the case ● Lots of genes in nuclear genomes and many of them are individual units that are expressed in the gene --> transcript --> protein kind of thing but this is not standard Tropanosomes (sleeping sickness, chagas, leishmaniasis) ● Causes diseases that have chainmail mitochondrial genomes ● You have a normal nuclear genome architecture - large, linear chromosomes, haploid ● If you zoom into the genes, they're arranged in really long polycistronic units (dozens to hundreds of genes) ● If you were to go along the chromosome, you would eventually hit another type of gene (the yellow ones) and these genes are all the same - 5' caps ○ Monocistronic ● The 5' caps do these little individual transcripts ● The yellow bits invade the polycistronic transcript - they come in and there's a little reaction that takes place at the RNA level and you end up getting your individual genes off of this thing due to the invasion ● Every mature transcript in that system has a 5' cap and then other levels of processing ● So the monocistrons invade and then give you the final gene ○ Spliced-leader trans-splicing ● The mature RNA has two very different exons that came from different places that were trans-spliced together ● They call it the leader due to all the identical bits ● You can regulate genes independently but in these polycistrons, you have to make everything at once so the regulation is happening at the translational level vs the transcriptional Oxytricha ● Has two different nuclei and two different nuclear genomes ● Very different from chlorarachniophytes (the nucleomorph example) ○ Not two nuclei due to endosymbiosis ● Has a macro nucleus and a micro nucleus ○ Micro is the smaller one ● Macro nucleus is the one that is active and it is the nucleus where the information is expressed into whatever ● The micro nucleus is the silent one and the information in there generally doesn't get expressed ● The active nucleus has thousands of gene sized linear chromosomes and this is strange since normally these are not gene sized, they're big ● Each chromosome is present in hundreds of copies so they're hugely polyploid, which isn't typical for a nucleus and there is hardly any non coding DNA ● The MIC looks totally different at the genome architectural level - haploid, normal, long chromosomes with lots of non coding DNA ● The exons are separated by non coding DNA but they are also not in order ● What actually happens is that all of the non coding DNA is excised and every one of those exons has to be put together with its mate in the right order and then everything is put into the gene sized units and hundreds of copies are made How many genes does an organism need? ● About 19000 in humans (nuclear and mitochondrial) ● Paramecium has 40000 genes ○ These numbers are referring to unique genes ● Microplasma have the lowest ○ They knocked out the genes to see what it would be able to survive with ○ Came down to like 400 genes that it really needs ● Complex organisms need more genes, and just because something looks simple, it doesn’t mean that it is ● A eukaryote needs more machinery to make it work than a virus and a bacteria needs more machinery to make it work than a virus ● Viruses < Bacteria < Eukaryotes ● Does gene number scale the genome size ○ Do the biggest genomes have the most genes? Not true for the most part ● Most of the variation in genome size is due to non coding DNA ○ Most of the genome is non genic Defining Function ● What does it mean to be a functional DNA? ● If it is functional, it must be crucial to the survival of the organism at some level ○ If you remove it you would either hurt or kill the organism ● If it is functional, it must be essential to that organism's survival ● Genes are functional for the most part ● If you remove them, you kill the organism ● Some non coding DNA is also essential ○ Eg. Regulatory genome elements, origins of replication, promoters, etc. ○ Intergenic DNA: DNA between genes ○ Intronic DNA: DNA between exons ● Sometimes one regulatory element may influence a single gene or it may affect multiple ● Regulatory elements are generally really short: 10 bp ish long (10-100bp) ○ Regulatory elements: non-coding DNA sequences that play an essential role in genome replication and/or gene expression ● You may have hundreds or even thousands of regulatory elements but if you add them all up, you still don’t get that much DNA ● Knowing that most of our genome is noncoding and that most of that non coding DNA does not appear to have a function, it leads to the question: are there any regulatory DNA of these non coding DNAs that we don't know about? ● Or is it just junk? By junk we mean that if you take the non coding DNA, zoom in and you remove it, you wouldn't hurt an organism ● What is the hidden function of noncoding DNA? ● Encode - the encyclopedia of DNA elements (the goal of ENCODE is to uncover all functional elements in the human genome, particularly those outside of genes) ● Making a note of every functional bit of our non coding DNA in our genome ○ The human nuclear genome has around 3 billion bp, 19.000 genes, and about 98% of it is non-coding → but how much of that 98% is functional? ● It is building on top of a huge milestone, which was the human genome sequence ● But really, it was just hype for non coding DNA -- where are the functional bits? ● ENCODE - trying to understand the human genome on a non coding level ○ Looking outside of the genes Human Genome Project ENCODE Project ● The DNA sequence of all ● Discover which regions are transcribed chromosomes ● Locate and characterize all regulatory elements ● Annotation of the genes on ● Chromatin modifications those chromosomes ● Uncover DNA methylation patterns ● Locate RNA editing sites Lecture 16 ● Human genome project ● Sequenced and annotated all the genes in the human genome ● Goal was to uncover all the functional elements in the human genome particular those in non coding parts ● They found thousands of new regulatory sites, around 400,000 transcriptional enhancers and 70,000 promoters ● 8.5% of the genome is involved in transcription factor binding ● 75% of the human genome is transcribed into RNA (transcriptionally active) ● 2% of the DNA is coding ● 80% of the human genome is associated with at least one biochemical function ○ So about 20% has no observable function ● What do you define as function ○ Does biochemical function mean the same thing as being functional ○ Their definition of functional is imprecise ■ Anything that is transcribed is functional - since transcription is a biochemical function ● Things are getting transcribed all the time and it doesn’t appear to do anything ● Most of that 75% is transcribed, noncoding RNA ● Do the regions that are not transcribed have a function? ○ Probably noise, just noise ■ Genomes often have a lot of transcriptional noise - regions that don’t encode proteins, rRNAs, or tRNAs, and with no regulatory purpose are transcribed ● You can say that 100% of your genome is replicated and replication is a biochemical function so you can technically say 100% of the DNA is functional ● Genlisea margaretae - 60 million bp → 75% → 45 million bp are active ● Paris japonica - 150 billion bp → 75% → 112.5 billion bp are active ○ We did the same things with these plant and in both of them, again we get 75% of the genome being active ○ But like does Paris japonica really need like 112 billion more bps ??? ● So how much of the genome is functional ○ Probably around 5% ○ Some people would say around 25% ● 5% of the DNA is methylated → one of the results of ENCODE ● The most common type is the cytosine methylation ● It happens at these CpG - CG ○ In the 5' to 3' direction ○ If you have a CG on one strand you're going to have it on the other strand as well ○ This tends to be the most popular type of methylation ● Methylation does not change the primary sequence, all you're doing is adding a methyl group (a covalent addition of a methyl group to cytosine bases in DNA) ● Methylation is very common so 80% of CpG sites are methylated ● Encode says 96% of the CpG sites are methylated which means 5% of the whole genome ● Every CpG site in your genome can be methylated so about 5% of your genome ● Methylation impacts gene regulation and expression ● Genomic methylation patterns can be heritable ● The exploding field of epigenetics is showing that when your methylation pattern is going rogue, it can lead to disease and aging ○ Epigenetics - methylation impacts gene regulation and expression ○ Good: development and genetic imprinting; silencing transposable elements and chromatin structure ○ Bad: disease (cancer); ageing; mental health ○ Genomic methylation patterns can be heritable (passed down generations) ● Tends to be a silencing mechanism - off switch for expression ● Doesn't have to be a gene, it could be a transposable element ● Methylation also impacts DNA packaging (methylation is not a mutation, but an alteration) ● Changes from the methylation pattern that deviate from what should be there can actually have major impacts ● Many diseases can be caused by the lack of methylation or methylation in the wrong places ● Methylation patterns can be passed down from generation to generation but not sure if it does so indefinitely - but this isn't fixed so it could be heritable for a few generations or many or none ● Twins have different methylation patterns ○ Twins have identical chromosomes, identical sequences, but they can vary in phenotypes due to their different methylation patterns ● Environment can influence methylation patterns (what we do in our daily lives impacts our methylation patterns) → stress, eating habits, exercise, etc. ● Before it was just mutations but now we realized that alterations can be influential Bacterial DNA Methylation ● Some bacterial methylate the sites that REs cut and it prevents their own DNA from being cut, while chopping up foreign DNA ○ Some REs cut at specific sequence motifs on foreign DNA but on their own DNA, those sequences are methylated, so they won’t be cut ○ So foreign viral DNA is cut while domestic DNA is protected ● Given the DNA sequence alone, you may not be able to tell what is actually going on at the cellular level ● Genetic Modifications: significant genetic alterations that are often not apparent or obvious given the primary DNA sequence alone ● Universal genetic code - everyone uses the same code ○ Must have evolved very early on and given rise to all of this diversity ○ Wouldn’t detect methylation patterns using this genetic code though ● But the code is actually not universal and the key won't work ● The product of the codons using the standard table doesn't make sense anymore - can't use the same table ● They usually pop up in organelle systems - not specific to them but mainly there Human ● The nucleus is using the standard code but our mitochondria is not ● A case where you have two different non standard codes in two different compartments ● If you are changing the codes, when you move them over, what happens? ○ The changes of the genetic code have evolved relatively recently in relation to the endosymbiotic gene transfer ○ Most of that evolutionary transfer evolved before this ● In each case, the non standard code can be different ● There are 25 different codes that have evolved ● The cox1 gene - humans, mushrooms, and diplonema all have their unique nonstandard code which is different from the other ● These nonstandard codes have evolved dozens of times independently throughout the tree of life but are found more rarely in proks Non Standard Codes: 1. Mitochondrial genome of vertebrates (includes humans) 2. Mitochondrial genome of invertebrates (C. elegans) 3. Mitochondrial genome of invertebrates (starfish) 4. Chloroplast genome of dinoflagellate (Lepidodinium) 5. Mitochondrial genome of yeasts (the missing codons cannot be read) 6. Nuclear and Mitochondrial genome of Diplonema (doesn’t recognize regular stop codons) Lecture 17 ● Differences in genome architecture and the forces responsible for differences when we talk about genome evolution ○ Size ○ Structure ■ The structure influences the ploidy ■ Differences in structure like circular, linear, fragmented ○ Content ■ Noncoding DNA ■ Repeat sequences ■ Regulatory elements ■ Mobile elements ■ Psugodgenes ■ Foreign DNA ○ Modification ■ Mythlation ■ Splicing ■ Nonstandard codes ● Genome architecture encompasses the content ○ Stretches of AT rich or GC rich DNA ■ What do these nucleotides actually encode? Are they genes? Are there many genes that are tightly packed or are there long stretches of DNA ● What is in that noncoding DNA? Is it made up of regulatory elements? Etc. ● What is in the gene? Does it have a lot of introns? ● Are the introns scrambled? ● Is that region of DNA methylated and what parts of it are transcribed ○ What is happening to the RNA afterwards? Is it being processed? ○ Maybe it is being translated in the nonstandard code ● All of these things are a part of genome architecture ○ Huge diversity in genome architecture, within cells, lineages and between different lineages ● A lot of the differences in the tree of life and you find diversity within a single cell if you compare the organelle and nuclear genomes ● How do we explain this diversity? How did it arise? ● You need to think about it on a molecular level and also the population level ● Molecular Level ○ The only way to get different genome architecture (diversity) is through mutations ■ Fragmentations, rearrangements ○ Gene conversion is when one sequence copies itself onto another - powerful evolutionary force ○ Depending on the type of mutation, you can get different architectural changes ■ Deletions (small or large) ■ Insertions (small or large) ○ There are duplication events where a gene or two gets duplicated ■ Or you can get whole genome duplication ○ Can have translocations ○ Or you can have a fragmented genome ■ And the genome fragments can come together in a new way - fusion ● There is a huge diversity of mutations that can impact the architecture ● Sure, the type matters but it also matters as to where these mutations occurred ○ Did they occur in a gene? Intron? Exon? Regulatory DNA? ■ Maybe it is affecting your whole genome! ● Is this happening to the whole genome? Or is it affecting a whole cell? ● Sometimes some mutations occur more often than others ● One type of point mutation is more frequent than another type ● You shift your genome to a more AT state --> bias towards the AT mutation so you're being pushed towards a genome that is more rich in the AT pairs ● If you get more inserted than deletions, you may end u with a more bloated genome ● Environment and cellular machinery impact it a lot ● These mutations are occurring but they're a reflection of the environment ● Some species have great maintenance machinery which means you hardly get any mutations ● Sometimes you have a crappy machinery and for whatever reason they're not good at repair ● Evolution is a population level process - changes within a population over time ● First you get the mutations (that could be beneficial, deleterious or neutral) and then depending on the size of the population (effectively large or small), it can be fixed or lost ● There is a population of algae and we add a mutation into the population and change the genomic architecture of one of these algae ● Fixed - stays in the population ● Lost - as if it vanishes and never occurred ● What determines if that mutation gets fixed or lost? ○ Beneficial/deleterious/neutral ■ If it is beneficial then it probably has a really high chance at being fixed ■ If it is deleterious, or if it kills it, then it is lost immediately ■ If it doesn’t change anything, then that is a little more difficult to figure out if it is going to get lost or not - random ○ Effectively large or small ■ Can have really big populations that behave like smaller ones due to various behavioral things like maybe only one guy can mate and all the other males are useless ● So you alter the genome and then it gets fixed or lost based on the power of genetic drift vs natural selection ● If the population is effectively large, natural selection is more efficient ● If you have many predators, small advantages can be very important since it makes you stand out from your competitors ○ In large populations, beneficial mutations have a large chance of being fixed while deleterious ones have a larger chance at being lost ● What determines if that mutation gets fixed or lost is the size of the population ● Adaptive hypothesis: most of the mutations that we are observing are fixed through natural selection ● Non-Adaptive hypothesis: caused by mutations that are fixed through genetic drift ● Is that mutation fixed because of natural selection or genetic drift ? ● The jellyfish example ○ There was a mutation in the past that caused the jellyfish DNA to break open and then from then on, all of those species had linear type chromosomes ○ Was that mutation that caused this ○ Was this fixed through natural selection or was it just a really small population so it got fixed through genetic drift ● The only way to get more DNA is through massive insertion mutations ● Maybe there was a bit of both drift and natural selection ● Adaptive hypothesis → skeletal DNA hypothesis ○ Mutations that are causing the increase or decrease in genome size are being fixed through natural selection ■ Natural selection is saying make my genome bigger so I can have a bigger cell or it is saying make it smaller so I can have a smaller cell and that would make me better at being a parasite ● Non adaptive hypothesis → selfish DNA hypothesis ○ Genomes can get bigger because the mobile elements can jump around and they insert themselves ○ The mutation is the TE and it is not providing a benefit so it is a non-adaptive process ○ Getting fixed through genetic drift ■ Oh but maybe the mobile element is actually providing a benefit by making a bigger cell -- so these things are not always mutually exclusive ● Mutational hazard hypothesis ○ The more possessions of you accumulate, the more harm you get from losing them (Polychaos dubium - 600 billion bo nuclear genome) ○ The more of these mutations that you accumulate, the greater chance of bad things happening when they lose ○ So maybe you get the intron but then something happens that prevents it from splicing and you die ○ The more stuff you accumulate, the greater the target it is for mutation ● The higher the mutation rate, the greater the burden of carrying all of this excess baggage ○ High liability ● The lower the mutation rate, the more you can keep this excess baggage ● Liability of carrying excess DNA is linked to mutation rate ● The higher the population size, the more efficient natural selection is and the better they are at purging the DNA ○ Since the burden of carrying the excess DNA is high ○ If your population size is low and there is drift and the burden of mutation is low, there is a lot of accumulation SUMMARY: ● Large population: natural selection efficient → beneficial fixed, deleterious lost ● Small population: random genetic drift → neutral, slightly deleterious fixed ● Adaptive hypothesis: natural selection ● Nonadaptive hypothesis: genetic drift Lecture 18 ● Genetic Modification: significant genetic alterations that are often not apparent or obvious given the primary DNA sequence alone ○ There are things that have a huge impact on the organism but you would never know just by looking at the DNA sequence ○ If you take the gene and translate it using the non standard vertebrate mitochondrial code, you get the full amino acid (shown on slide) ○ But if you didn't know about nonstandard codes, and you translated the gene, you would've had a different amino acid sequence with all kinds of stop codons inserted throughout ○ These genetic modifications are critical and we don't know about them so when we try to figure them out you will be pushed down a totally different road ● RNA Editing ○ A type of genetic modification ○ A modification that changes the underlying sequence information ○ Usually, the mRNA should reflect perfectly the DNA sequence from which it was derived ○ The only difference is the fact that Ts have changed to Us ○ Usually, this holds true, but it is not always true Slaginella ● Had a gene within an intron of a gene ● Has weird editing going on ● In its mitochondrial genome, there is a gene with its corresponding transcript ● The DNA sequence of that gene has a corresponding mRNA for that coding sequence ● Everything looks right and the only difference is the fact that the Ts have become Us ● After the transcript is generated, a lot of the Cs are turned post transcriptionally into Us ● Post transcriptional editing makes it harder to predict the amino acids ● The Cox1 gene - on the slide, he highlighted every single one of the Cs that get turned into Us ○ The ones in red are post transcriptionally edited to U ● Over 200 Cs are changed in that single gene ● So in the mitochondrial genome of the Selaginella planet, the RNA doesn't reflect the DNA to a large degree ● Have very different sequences ● For every site that is edited, there are proteins that bind to that RNA and usually there are a few proteins and this complex is called edidisome? Whatever, you don’t have to memorize the name ● Just know that there is a complex whose sole purpose is to edit that site to U ● Every single one of these Us that are edited needs a separate protein complex ● They look really similar but it needs its own one that is encoded and edits the site ● So for every site that is edited you have a unique protein complex ● You wanna take that gene and turn it into the cytochrome C oxidase thing and in normal systems it is an easy journey but in this organism, making that one protein requires hundreds of other proteins that bind to the RNA that fix it ● This idea of RNA editing is found in almost all land plant's chloroplasts and mitochondrial genomes → substitutional RNA editing Trypanosomes ● Have weird chainmail DNA; mini circles and maxi circles ○ Maxi circles have the genes and these mini circles serve to edit the genes ○ Now, we have Us being inserted or deleted from the transcript ■ Hundreds of U’s can be inserted and deleted from a single gene ○ So it is similar to the last case, but different ○ The editing comes along after transcription and it starts throwing in Us and deleting them and this can happen a lot ■ So maybe in the last example, you would be able to figure out that the gene is COX1 - the editing didn’t complelty dissolve the information at the DNA level ■ In this example though, there are so many Us being inserted and deleted, you would never know what you were looking at ○ So this complicated editing is all done through transcripts of the mini circles ○ You get your strandard mRNA that you expect from the maxi circles and then the mini circles have little mRNAs that guide the editing ○ The mini circles mRNA are complementary to the maxi and they bind to it and they actually guide that protein complex that comes along and the small mRNAs tell the complex where to insert and delete the Us Diplonema ● Has 9 different pieces ● In its large ribosomal subunit RNA, there are two chromosomes and each one of those gives you a transcript for the front end and the back end of that ribosomal RNA ● Those transcripts are given this PolyA and PolyU ● They are trans-spliced together into a single mRNA and that is what gives you your functional rRNA ● In other examples where there were fragmented rRNA, the pieces were never spliced together so when the rRNA was folded, it was folded by just the pairing interactions of all the pieces ○ If you denatured it you would get lots of different pieces ● In this case, the rRNA is a single piece that is folded and the only difference is that 25Us were shoved into those pieces and now the RNA has this loop ● When you think about genome evolution, you need to think about how mutations come in, alter the architecture and then you have to ask yourself ○ Was it beneficial and did natural selection push in that direction, or ○ Was it random genetic drift that ratcheted in there ● If you have the protein machinery to change the RNA, then maybe you can have mutations at the DNA level and then they're corrected at the RNA level ○ Mutational buffering ● Alternatively, it doesn't make sense that you need all these proteins to make a single protein - seems wasteful and inefficient so maybe drift fixed it ● If you wanted to move that gene to the nucleus where the editing isn't the same, or if you did lateral gene transfer from salegenella to an algae that didn't have the editing, it could be a barrier to those types of movements ● What would happen if at the DNA level the C was turned into a U Reading: ● Kinetoplastids are flagellated protozoans - unicellular eukaryotes ○ Trypanosoma → chagas, sleeping sickness, etc. ● They all share a unique mitochondrial DNA structure → kinetoplast DNA (kDNA) ○ Giant network of interlocked rings ● Kinetoplast: a mass of mitochondrial DNA lying close to the nucleus in some flagellate protozoa ○ The kinetoplast is a self replicating organelle and its division precedes that of the nucleus and it also related to specific chemical reactions for mitochondria ○ When condensed, the kDNA has a disc-shaped planar structure (organized by proteins) ● To study the kDNA structure, scientists used topoisomerases (enzymes that make specific cuts in the DNA) to decatenate the rings ● This treatment, along with photography in platinum and palladium, finally made it possible to visualize both the minicircles and the maxicircles ● Its kDNA network contains approximately 5,000 minicircles of 2.5 kb each and about 25 maxicircles of 37 kb each ○ Sequencing revealed that the minicircles are heterogeneous in sequence ● During the trypanosome life cycle, the position of the kinetoplast changes relative to other cell organelles, but it always remains close to the basal body → linkage between them? ● Maxicircles encode typical mitochondrial proteins ● As they began to study the transcription of the kDNA genes, investigators found to their dismay that many of the protein-coding genes appeared to be nonfunctional ○ The CoxII gene contained a frameshift (a mutation caused by deletion of a number of nucleotides) within the coding region ○ Since this gene is highly conserved in different trypanosome species, they assumed that it must be functional despite the mutation, and they decided to analyze the sequences of the coxII mRNA transcripts ○ The mRNA transcripts differed from the corresponding gene ○ Four uridine residues were inserted at neighboring sites, which corrected the frameshift in the gene ○ These four extra nucleotides were not encoded in the genomic DNA sequence ● RNA editing describes the process by which uridine residues are inserted or deleted from the mitochondrial RNA ● Guide RNAs (a.k.a. gRNA) are the RNAs that guide the insertion or deletion of uridine residues into mitochondrial mRNAs in kinetoplastid protists in a process known as RNA editing ● the gRNA molecules were mainly encoded by minicircles, although some gRNAs can also be encoded by maxicircles ______________________________________________________________________________ ○ Bioinformatics ■ Using digital things to study genetics ○ Green Alga ■ Grow it and isolate its DNA ○ GenBank - can get the data from these guys and use it to get the chloroplast ■ Got rid of the experiments he was doing and he just pirated the data from GenBank ○ Also called NCBI ■ Not just genes and DNA, it has transcripts and protein sequences, whole genomes, whole transcriptomes, raw sequencing data, polymorphisms, methylation patterns, RNA editing sites ○ New bioinformatics software's interface is connected to the data bank Command-Line driven User-friendly GUI (graphical user interphase) ● Hard to use ● Easy to use ● Open source ● Commercial ● Slow to learn ● Fast to learn ● Run fast ● Run slow ● Easy to tweak ● “Black box” Lecture 19 ● Personalized genomics in healthcare are going to become hugely influenced by bioinformatics ● Really easy to get the sequence read, but we really only get small pieces ● We have all this data but how do we put it together ● You start looking for overlaps to put things together ● 25 nucleotides that match - a pretty good bet that they belong together ● 4​ n ● But now we have repeats in there ● You can put them together since they share identical repeats Sequencing Reads ● So what do you need? ○ You need another read that spans that whole repeat and then anchors one of the original ○ But in many genomes the repeats are so long, that you wouldn't get ○ Which is why you have sections that you just haven't been able to assemble ● To put them together, the computer looks for overlaps ● As the sequencing reads come off the machine, some are good but some are bad ● The key is to find sequences that span the repeat ● Algorithms used to evaluate how good they are ● BLAST - the database that you use to figure out what you're looking at ○ Against a database of known DNA ● BLASTN - comparing nucleotides ● TBLASTX - take unknown sequence ○ Translate all 6 frames and then search those against the data base of the same thing ■ Aka you take every nucleotide run in the database and convert it to the 6 frames ● BLASTP - protein vs protein (amino acid sequences) ● You have an unknown sequence and we are going to tBlastX it since there’s a greater chance that it is protein coding ● So you're looking at hits from the database and it hits the COX1 genes ● The coverage is good; meaning our search query is covered almost completely by the hits ● The type of hits look consistent - they're all the same thing and the score is high ○ When you blast you're given scores like E values, etc ○ How similar is your unknown to the hits? ● If the types of hits are all over the place and not consistent, then it doesn’t make sense ○ If the score is low, then the percent identity between the unknown and the hits isn't very good ● But maybe it was never a protein coding sequence so we shouldn't use the translation of it to search - use BLASTN ○ You can find consistent type of hits ● Then you can figure out that it is a ribosomal RNA gene from a plant ● What about the human mitochondria genome and you wanted to tBLASTX it against GenBank ● What would screw up that blast? What is unique about the mtDNA that would fuck up the search ○ The human mtDNA uses a non standard genetic code ○ If you're going to translate the query into the 6 frames then you better make sure you pick the specific non standard code ● Thankfully, you can select which code you want to use ● What about the trypanosomes? You shove in a whole bunch of Us and delete a whole other bunch ○ If you were going to BLAST a sequences from that and want to get a hit, you should use RNA ● The DNA would never give you good hits but you want to use the mature RNA ● You can do this at home and then you have all these reads and you can do a lot of things like align them to a reference sequence to see how they match ○ Or you can do de novo assembly, put them all together in bigger pieces ○ Maybe you can search GenBank for the human mitochondrial genome ○ You literally search it and you download it then you can see all the different genes on it - tRNA, rRNA, protein coding RNA, etc ● You can click on COX1 and then extract it ● You can align genes, build phylogenies and figure out how these genes are related to each other in a tree ● You can download other things - apps that you can install into the interphase that may help find repeat elements ● May help look for promoter sites in the genome ● Can access all the genes and genomes and store your information, etc ● Tell the software to do shit automatically - preprogram it to do things ● This can happen on a minor scale or a massive scale ● The raw sequencing data gets pumped into the bioinformatics main frame and assembled and all these tasks are done automatically and then the person studying HEPA has the information in front of them ● The biggest computers in the world are used for bioinformatics ● These computers take a huge amount of money to fund and provide the energy required for it ● If you're a private company, it could be a huge investment ● We have new chemistries of sequencing and that leads to them reinventing algorithms etc to drive this and to stay caught up ● Genetic barcoding: sequencing a gene from a species and using that sequence to figure out what species that is Lecture 20 ● Craig Venter created the first biosynthetic prokaryotic micro plasmid bacterium = world’s first synthetic organism ● Synthesized a complete genome and transferred it to a host cell ● Rebooted the host cell so that it was under the control of the introduced synthetic genome ● Can introduce synthetic tools into diabetic people to provide “cure” ● How can we take all these process that occur in the cell, put it in a computer so that we can manipulate it and make it in a way that it can function the way we want ● Example: insulin ○ Introduce these synthetic devices that will sort of prevent diabetes (to a certain point) ● Take the genetic code and translate it to a binary code in the computer → one of the starting points for synthetic biology ○ Not as simple as taking genes, converting to genetic codes and putting in the computer → recombinant DNA technology = biotechnology ● But a biological organism has so many processes organized at a hierarchical level, so it is not as simple as taking these genes and converting them to binary code and putting it into computers ● There are certain processes and the genes within the genome code for different regulations ● The gene needs to react to the environment and there is also communication between small molecules and proteins ○ External organisms that cause responses ● You have to somehow figure out a way to put all these levels of these processes together and input them into a computer ● Synthetic biology is a way to connect all these regulatory, sensory, physical, etc elements in a reliable and predictable way ○ The term predictable is really important since if you want to put a few genes together, you want them to give a predictable outcome - you want genes that respond to the environment in a predictable fashion ○ Apply engineering principles to the design and alteration of natural systems or de novo construction of artificial biological devices and systems that exhibit predictable behaviours ● Need to program the cell to function as a whole system ● You need to take advantage of a hierarchical system that already exists in biological systems ● We already know what the cells are composed of and what proteins, etc they have and how they are designed to react ● There is also an organism to ecosystem hierarchy (reacting to the environmental stimulus) ● Since the hierarchy already exists in synthetic biology, it makes our job easier ● You can alter the existing natural systems and create an entirely new system ● Think about it in the context of a pathway, cell, or system as a whole, not just the level of the gene ● Connect various elements/layers (regulatory sensory, etc.) in a reliable, predictable way and program a cell to function as an autonomous system, in the highest efficiency with the smallest genome possible (original definition) ● Earlier known as recombinant DNA technology or biotech ● Construct a biological system (ex. A network or a pathway) using characterized genes and regulatory DNA sequences ● GOAL: achieve the highest efficiency in the synthetic cell with minimal genome that a cell can function efficiently for the cell to be able to adapt to any type of circumstance ● Method: replicate the following areas “layers” of focus in a synthetic cell, but need to understand how they function first ● Regulation​: necessary to produce enough protein/RNA in the cell to perform their function properly ○ Transcriptional, translational, post-translational, modifications, epigenetic ○ Need to transform the cellular information into technological codes that can be input into computer ● Sensory Stimulus​: has to sense the signal within and outside the cell in order to perform functions such as glycolysis and photosynthesis ○ Essentially: how the cell reacts to its environment (both the outside and the inside) ○ Stimuli are how the cell detects these signals and transmits those to the interior of the cell ● Stimuli​: chemical, light, force (ex: chemical, kinetic, potential, pressure) ● Communication​: cell-to-cell communication, protein interactions OR how the cell communicates with other cells and its environment ○ Layer we are at now in biotechnology – we can create parts and components that do regulation and sensing, but need to create communication between cells to create multicellular organisms ○ Could be done through small molecules, proteins, viruses ● Physical​: response to motility, growth (photosynthesis, glycolysis), transport ○ Most important aspect of synthetic cells ● Epigenetic: level of regulation based on the the environmental stimulus that exists ● Synthetic Biology: Apply engineering principles to the design and alteration of natural systems or de novo construction of artificial biological devices and systems that exhibit predictable behaviors (new definition) ● A computer can actually perform all the functions that a cell can do ● Trying to synthesize parts that can function as a system and an organ by itself through technology, with the minimal genetic sequences/materials (genome) required in an efficient & predictable way ● Take all the naturally existing system/programming inside a cell & transfer it into the computer ● By the end, the synthetic organism must be able to survive in an unique environment that the scientists impose ● Synthetic biology can be used at all Hierarchy and Modular Organization ● Connects fundamental parts together (proteins) to get a desired outcome (products at the end of a biochemical reaction) ● Multiple biochemical reactions together ● Network of modules/biochemical pathways together ● Multiple cells together to form a network ● Parts to modules to complex systems, adapt from other disciplines and implement in biology (analogy) ● Proteins and genes are the 1​ layer = physical layer ● A bunch of parts form gates. If connected together in a reliable fashion biochemical reactions = gates ● Can input repressors or inducers that act as gates and produce proteins which act as the output ● Gates put together form modules (pathways made up of multiple biochemical reactions) ● Pathways make a cell functional ● If you want to connect all the computers together, you make a network ● If you want to connect all the cells together, you form tissues/cultures ● Start by synthesizing/selecting DNA sequences containing the genes that contains desirable protein product ● The amount of DNA selected & the order of genes = based on the identity of the desirable protein product ● The product should participate in a biochemical reaction that you know would give you the desirable product (see circuit) ● Scale for DNA synthesis & assembly: ● In the order of genes: 10^2—10^4 bp ● A gene circuit’s size: 10^4—10^6 ● A minimal genome: 10^6—10^7 (a combination of more than one circuits that is required to function, no embellishments) ● This process is easy for bacteria = create a traditional recombinant plasmid since the sequences can be fused and inserted ● Difficult for eukaryotes especially mammals, need to understand specific functions in each part of the genome ● Challenge: unsure if the circuits will function as predicted once fused together in the minimal genome ● NOTE: a microplasma = 1.8 mb (megabase) Synthetic Biology Circuits ● Sensing, processing, actuation ● Sense what is happening in the cell (microRNA, mRNA, and proteins) in the form of a regulatory circuit ● If this and that are present, proceed to the next step ● If you get to the end, you will get a genome ● If you get too much production, you can kill the protein ● microRNA, mRNA and proteins can all act as regulatory circuits ● The microRNA / mRNA / proteins (physical layers) detect a signal in the environment & react together (circuit) ● The final or byproducts serve as stimulus to the targeted molecules, the level of these products affect regulation (regulatory) ● The process of proteins reaction together to serve regulatory functions is similar to programming: A+ B → C ● If your logigate functions according to the logic command given (A+B), the product yield would be called “activation” (C) ● In synthetic biology, the product C must behave in a predictable manner → must control the gate & logic command ● Mutations create unpredictability in systems, force them to react to the mutation in different ways 1. Sensing 2. Processing 3. Actuation ● Scale for DNA synthesis and assembly ● Driving force for more synthetic creations = dropping costs ● Trying to create a gene circuit so you put them all together using recombinant DNA technology ● Increasing complexity as you go up the hierarchy ● Can put in all kinds of stuff (e.g., repressors and genes) but it doesn’t mean it’ll function The Design Cycle ● Must conceptualize what your goals, inputs, outputs are – and then you can design it ● Select parts and the computer will put it together for you ● Then you can start modeling the system – and then you can construct the system ● Can use restriction digest to get the system ● Once you get the system, you can probe, test, and validate it 1. Conceptualization → identify the system goals, necessary inputs needed to create the designed outputs 2. Design → understand network topologies, kinetic parameters, parts selection (all can be done on computers) 3. Modelling → how do circuits interconnect? Understand network behaviours, robustness, sensitivity 4. Construction → assemble & integrate into a plant 5. Probing, testing and validation → alteration, library screening, directed evolution *1-3 = design ; 4-5 = fabrication* Tools for Design Cycle: 1. Engineering principles for design (simply the process of construction) ● Reduce efforts of design cycle ● Decoupling: separate / taking apart each level of the cellular component to understand their functions / structures (simplification – rip system apart and see what’s in it) ● Abstraction: extract the components from the cell/host, identify how they fit together to produce a viable system (separation into hierarchical levels, see how the pieces fit together) ● Standardization: manipulating the separated components in such a way that the components should be able to function properly aka when we put input, an output should come out = putting them back together 2. Components for parts selection ● Parts are designed and cataloged onto online databases, then parts are used to build a circuit ● Example: anything that is important to gene expression (cis-elements, promoters, exons, protein domains, ORFs, terminators, initiation sites) ● Biobricks – things available for purchase ● Phytobricks – plants ● Challenges: difficult to make sure that transcriptional regulation functions properly and hard to guarantee the precise control of expression in synthetic circuits & time-consuming ● Otherwise: could yield unpredictable consequences in a circuit, and doesn’t work the way you want to 3. Computational tools for design and modelling ● Help to design network and put parts together ● Component design & synthesis (design network)) ● Can be obtained from the Library of Parts ● Composed of the 4 areas of focus (regulation, sensor, communication & physical aspect) ● When designing components, need to make sure that these 4 components are satisfied ● Topology and network design (put parts together) ● Behaviour predictions and simulation (see whether the cell survives) ● Registry of Standard Biological Parts – a database where you can get biological parts from ● A database from iGEM (international Genetically Engineered Machine) ● Has more than 200 000 biological parts that could be used in the designing cycle Synthetic Microplasma Genome ● Prokaryotic cell that contains a completely synthetic genome ● No matter what changes you introduce to the system, they have to be predictable ● Take parts like mRNA, miRNA, etc that are present within the cell and join them through regulatory systems, meaning parts A and B are going to turn on in response to a stimulus and produce an output ● In order to build these parts you need DNA ● So if you were to create gene circuits then you would have to fuse a few genes and their regulatory elements for the genes - cis elements, promoters, operators, etc that will control the expression and repression of these genes ● You would make a plasmid that you would insert ● If you're creating an entire genome, you have to consider the amount of DNA that is required (from mb to gb) ● You have to take these little parts and stitch them together in a way that they can have a predictable function ● To be able to do that, you have to follow a design cycle ● First, you want to conceptualize what your end product is supposed to do ● Think about your goal, input and outputs ● Then you can design your model to reflect the given input and produce a predictable output ● Then you have to actually construct and stitch these pieces together to get the final product ● And then you integrate it and after that, you screen and see what happens ● There are tools you can use to follow this design cycle, which starts with the engineering principles because the hierarchy and the levels of organization are already there so we just have to take advantage of it ○ So first you decouple, followed by abstraction (see how they fit together to form this robot) and then standardization ■ So if you take them apart and put them together in a certain way, then they will react to an input and produce a given output ● Can go to this registry and pick the parts that you want and constitute your device and make your system a functional system ● There are certain challenges associated with picking your parts that can form your device ● Use tools and computer programs to model it and predict the outcome that is going to occur by stimulating what kind of output that you're going to get with the parts that you are using and the way that you are stitching the parts together ● Your end goal would be to hit the parts that will control all of these levels of hierarchy ● Theses parts are available in databases Lac Operon Concept ● Exists in bacteria ● Genes are organized in operons ● All the regulatory elements for a few genes are ahead ● Regulatory gene that when expressed, produces a protein that will bind to and repress the operator site ● So when a regulatory gene is expressed, it decreases the expression of the downstream genes ● When it is produced, lactose will bind to it and repress the repressor protein ○ So the repressor protein can't bind to the operator site anymore ○ So the downstream genes are expressed Lactose ABSENT ● Operators within the promoter region, comes before the genes ● There is also a regulatory gene present before the promoter ● Regulatory gene induces the lac repressor protein (increases expression) to bind to the operator site and therefore RNA polymerase can’t bind to the operator site and transcription does not occur (gene Z Y A are not expressed or reduced on the lac operon) Lactose PRESENT ● Lactose binds to the repressor protein and causes a conformational change such that it can no longer bind to the operator site and therefore RNA polymerase can bind to the operator site and transcribe the genes (genes Z Y A are expressed on the lac operon) ● Each part of the operon are considered as parts of the genes ● If we take those parts and put them together, it will become a device and this whole operon functions in a predefined way as a reaction to the external stimulus that it receives and in this case that stimulus is the presence or absence of lactose ● We have parts and devices are diffusion of a few parts together and there are systems that are able to function in the context of a cell ● You have an input that is your stimulus which will induce or repress a set of genes that are organized in the form of a logic gate Logic Gate ● A logic gate is a controller that will tell how the genes should react based on the stimulus given and produce the end product ● Gives you an idea of how the logic gate can function ● Looking at basically a light switch that you turn on and whatever happens from the switch to the actual light is the control so it is organized in such a way that it can turn on the light which is you output ● Parts that are used are promoters and stuff that control the expression of these genes ● It is called a repressilator since it is just few repressor proteins that will function in an oscillating pattern ● In this plasmid, you have a few genes R1 R2 and R3 and these genes are preceded by a
More Less

Related notes for Biology 2581B

Log In


Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.