Class Notes (838,474)
Canada (510,906)
Biology (2,229)
BIO120H1 (1,171)
Lecture 6

BIO240H Lecture 6.doc

5 Pages
104 Views
Unlock Document

Department
Biology
Course
BIO120H1
Professor
Jennifer Harris
Semester
Fall

Description
Lecture 6: What is a gene and where can I find it? Lecture Outline 1) Definition of a gene 2) Bioinformatics Readings: Alberts textbook, Ch 8, pp. 550-552 Which is fold degeneracy of primer constructed for following AA sequence? NQWWMSY Fold degeneracy of primer reflects degeneracy of codon table for those particular AA residues – Ex: N (aspirgine residue AA) can be coded for by 2 different codons in codon table – differ by transition at 3 codon position. Glutamine (Q) – 2 different codons. Tryptophan (W) – not at all degenerate; only 1 codon that codes. Methionine (M) – 1. Serine (S) – highly degenerate; 6 different codons that it can code for. Tyrosine (Y) – 2. Fold degeneracy is combinatory thing – how many possible combos of nucleotides can give rise to this AA sequence? – Multiply 2*2*1*1*6*2 = 48. Questions of the Day 1) What is gene? 2) What sequence is this anyway? 3) What is its function? Molecular Definition of Gene Entire nucleic acid sequence (usually DNA) that is necessary for synthesis of protein or RNA. In other words, genes are segments of DNA that are transcribed by RNA polymerase into RNA. Remember there are 2 types of genes: 1. Protein-coding – traditional view of what gene is – gene would code for protein. 2. RNA-coding – sometimes RNA itself is final product that has function inside cell & doesn’t need to be subsequently translated into protein in order to accomplish that function – tRNAs, ribosomal RNAs. - Blue are stop codons - Only 1 with open reading frame is #2 – no stops – if you get stops, not going to get full protein (protein synthesis will terminate). - Red regions: no stop codons. - 1 with largest reading frame is #1. - If you look at any sequence of fragment of DNA – double stranded with orientation 5’ to 3’ – if we were to try to translate this piece of DNA but we didn’t know which way we should be going & if we didn’t know exactly which reading frame we should be in, there are 6 possible ways to try & translate this sequence. - Reading frame #1: take 1 3 nucleotides TTA – translates into leucine, 2 3 TTT: phenylalanine, TAT is tyrosine & contains stop codon. If you were to shift it by 1 & start reading frame with T – TAT – tyrosine, TTT is phenylalanine & so on – that’s how you get reading frames. - 1 clue as to whether or not you might be looking at protein-coding gene sequence would be that proteins are composed of generally long stretches of AA sequences so we would not expect stop codon in middle of protein. Look at all these open reading frames & rule out ones with stop codons in them. - Blue lines are stop codons – are fairly common if you’re not in protein-coding reading frame. According to this kind of thinking, can just look at 6 different reading frames & choose 1 with longest open reading frame – sequence b/w 2 stop codons. Potential protein coding gene – starts with methionine & ends with stop codon. - If you have genes that have introns, can’t expect long open reading frame b/c it could be interrupted at any point by intron that could actually change reading frame – might actually hit stop codon b/c it’s an intron. DNA Detective It’s dark & raining outside, you get phone call from desperate grad student in middle of night, who faxes you unknown sequence before disappearing forever. How do you figure out what this sequence is? Finding Info on Mystery Sequence 1. Are there sequences in database which are similar? BLAST search 2. Can it be aligned to family of sequences? ClustAl sequence alignment 3. How similar is it to this family? Sequence similarity 4. Is it related to this family on a tree? Phylogenic analysis 5. Which protein domains does it contain? SMART analysis  BLAST search (does it match perfectly a sequence in the databases?) Blast Searches 1) Nat Centre for Biotech Info 2) www.ncbi.nlm.nih.gov - Access genome databases with query sequence. What is BLAST Search? 1) Algorithm that uses short stretches of sequence similarity to find related genes in database – if you queried with very long sequence – that long sequence would have to be aligned to every single gene in genome database – that is incredibly slow process. 2) Algorithm is fast, efficient. - Red is what you inputted. Black are all different sequence bases that line up with your sequence - Want low E-value & high score. - Show you distribution of hits – where they align to in query sequence. Show you descriptions of each of those hits – give you excession #s, species’ sequence it came from, scores & e-values that give you indication of how well these sequences in database might match your sequence. - These statistics are based on probabilistic matches – probabilistic values of how likely match would be to random sequence. - In general, the higher score, the better the match & lower the e- value, the better the match. - Also give you region of query sequence that matched sequence in genome databases. Questions from last time: 1) What is the sequence of the template strand read in the 5’  3’ direction? Shorter fragments are further down because they traveled farther and you read from the bottom upwards so 5’ ACCGATT 3’ 3’ TGGCTAA 5’ 2) The original template is: 5’ ACTTACGTAC 3’ 3’ TGAATGCATG 5’ Primer = 5’ ACTT What does the sequence gel look like? A C G T -- -- -- -- -- -- Slide 1 1) Only short stretch of similarity needed for a BLAST hit 2) Too computationally expensive (will crash computer) 3) Higher score or Lower E-value (is better) - The higher the score you get, the closer match you have, the E- value tells you if it’s by chance or not, the lower it is, the more likely that the match is not just by chance. - Probabilistic statistics to see how likely your sequence matches those in the database - Will give you region of query sequence that matched in sequence database Slide 2 - There is no exact (perfect) match in database from the BLAST search Now we search for how close it matches to some known family of genes, so we pull the top closest BLAST results. - Sequence alignment and similarity (is next where you find similar sequences, searching for how similar their sequence is to the one in the database (matching fragments, not exact match).) Slide 3
More Less

Related notes for BIO120H1

Log In


OR

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


OR

By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.


Submit