Class Notes (807,841)
Canada (492,871)
Biology (2,220)
BIO120H1 (1,171)
Lecture 19

BIO240 Lecture 19

7 Pages
Unlock Document

University of Toronto St. George
Jennifer Harris

Monday, September 29, 2008 Lecture 7: What is gene & where do I find it? (Part II) Outline: 1) Bioinformatics 2) Assays of gene function Readings: Alberts textbook, Ch 8, pp. 563-572 Now to determine the function For genome projects, gene prediction is often - Alternative splicing complicates gene prediction b/c final spliced difficult. Which of the following factors is/are NOT product can vary in different tissues. a concern? - Degeneracy of Genetic code is the answer b/c we know exactly a) Alternative splicing how the genetic code is degenerate so we can predict based on the b) The degeneracy of the genetic code genetic code exactly which sequences will code for coding c) Possible presence of introns sequences which is what’s important in gene sequences. d) Variability in consensus sequences for Degenerate refers to the fact that DNA sequences don’t code to transcription initiation factors greatest capacity as in 64 codons but we only have 20 AAs and e) None of the above some function as stop codons. - Possible presence of introns can disrupt the reading frame – resulting in frame shifts which may cause stop codons that may actually occur prematurely, introns is huge problem for predicting genes. - Variability in consensus sequences for transcription initiation – if there was no variation in consensus sequences for transcription then we can use it as 100% marker for predicting genes but in fact, there is certain amount of variability always. - We’re halfway through the bioinformatics investigation.  Phylogenetic analysis (looking for relatedness in a family tree) - Bioinformatics on one hand can tell certain amount of things, but complementary to them are the experimental investigations in the laboratory. - DNA detective – provides clues to what the mystery sequence might be – none are definitive – can be tested in the lab to confirm findings. - If sequence matched with that in database perfectly, then implies that it isn’t a predicted gene that came out of analysis of genome projects but actually came out from experimental investigations & conclusions backed by all that experimental evidence. - All of the subsequent analysis are really just clues to finding what the sequence really is. - Mystery sequence: Blasted it to the NCBI database but we didn’t find exact match – but we found things that are very similar (high score/low E value). Take some of those other sequences that came up in BLAST search & see if our gene is similar to genes known of other organisms. Could have been gene extensively studied experimentally in model organisms (mouse) but not in organism we’re interested in. - Clues come from sequence alignment & calculating sequence similarities – already did multiple sequence alignment, found it was similar to some of the sequences we pulled out but is it similar enough to be described as a gene that is previously identified. To make that clear, we have to perform further investigation which would involve phylogenetic analysis – if it is gene that has already been isolated but from other organisms or if it is member of entirely new gene family but related to other gene families that already exist. - We used BLAST searches to find similar sequences to 1 in databases to the mystery sequence & took those sequences that are most similar and used ClustalW to align those sequences with mystery sequence. To assess how similar mystery sequence is to others, did calculations of percent difference/identity so that we have some measure of relationship b/w mystery to other sequences in alignment – not a very complete description of that relationship. To define relationship, we need to do phylogenetic analysis. - Phylogenetic analysis is a computer algorithm that creates an evolution tree from the data inputted. 1) “PAUP*” 2) “Multiple sequence alignment” – basis on which it starts to estimate evolutionary relationships – (from Clustal W) it is needed for any Phylogenetic analysis program. 3) “Phylogenetic Tree” – describing those evolutionary relationships – to describe the relation between the different sequences. 4) “Minimizes” – assumes evol’n progressed from fewest # of steps. - What are some of assumptions in analysis? 1. Evol’n proceeds by shortest, most parsimonious path. 2. Evol’n proceeds via bifurcating path – that’s what gives rise to tree-like topology. 3. Rate of change of sequences on phylogeny is very slow. 4. Sites in protein (nucleotide sequence) evolved independently from each other so each 1 can be viewed as an independent estimate of the evolutionary relationships among the species. (Parsimony means being very careful, not wasteful. In this context it means reducing the max # of steps possible) NEXUS (multiple sequence alignment along in particular type of format) - NEXUS – Somewhat like a computer script, there are commands within body of the NEXUS file. - First two bioinformatics tools were widely available web-base tools – now not as user-friendly, no website you can go to – actually have to go & download program & run program locally. - Some of these methods are much more computationally intensive, not just how widely used they are. - From this multiple sequence alignment, this computer algorithm is going to estimate a phylogenetic tree. - Gene is over 300 AAs long - the picture is only a segment. Human Violet Chick Violet - Interpretation of tree: next most closest sequence to mystery sequence is the human violet sequence and the next most closest sequence to both of these is chicken violet opsin. - Can conclude that this mystery sequence is most likely a violet opsin sequence. - This analysis highlights one of the serious potential pitfalls of doing this type of analysis incautiously – with so few sequences up there, it’s actually difficult to say anything conclusive about what the mystery sequence is – may have incomplete sampling of relevant sequences. - Say we were very cautious, these are our preliminary results but we decide to gather more sequences & redo this analysis, would we get something different? If we gather many more sequences by going further down BLAST search results, pulling those sequences out again and aligning them again with Clustal & then performing phylogenic analysis on extended multiple sequence alignment. - Looks very different from previous tree even though sequences from previous tree are still in this phylogeny here – with more data & better sampling, results become more accurate. - In this phonology, mystery sequence is actually contained within UV violet opsin clade – either ultraviolet or violet opsins – makes it quite clear that if we had just been happy with previous much smaller analysis, we may concluded that mystery sequence was just a violet opsin – that would not have been necessarily correct. Now we know it could be a violet opsin or an ultraviolet opsin or neither 1. - This phylogeny has branch length info on it – branch length info is actually proportional to evolutionary distance b/w the sequences. - We have some clues to what the mystery sequence might be. Recap: Bioinformatics tools - Easiest way of assessing protein function is experimentally but 1) Finding similar sequences in the database we still have bioinformatics to work on it with. - Even if you had a lot of that protein, what do you do with it? 2) Aligning sequences to the mystery sequence - What do you look for first before making lots of the protein, you look at the protein and see if it gives you any clues to what’s going on. 3) Assessing how similar the mystery sequence is to others 4) Determining the relationship of the mystery sequence to others 5) Assessing prote
More Less

Related notes for BIO120H1

Log In


Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.