BMS2062 Lecture Notes - Lecture 5: Transfer Rna, Gene Duplication, Integrase

73 views10 pages
Week 3. Genome annotation and the Human Genome
Project
GENOME ANNOTATION: IDENTIFICATION OF A NOVEL BACTERIAL VIRULENCE
FACTOR
Bacterial genes rarely have introns
Getting information out of sequence: what next?
o Locate protein coding regions on the genome sequence
o Predict the function of proteins
-> genome annotation
What is genome annotation?
o Overlay of biological information onto the genome sequence
o Predicting and marking the position of genes and other elements on a genome sequence
i.e. protein coding sequences
RNA features usually predicted directly
Protein coding genes
1. Predict location of genes on genome
2. Translate encoded protein and predict function
o Predicting protein function
Similarity to characterised proteins
hpothetial poteis = ot similar to any characterised proteins
Annotation: gene finding
o Prokaryotes: simple gene, introns are rare but can be multicistronic (MRNA can have
multiple proteins encoded)
o Features of protein coding genes:
Contained in ORF
Initiation codon, ribosome binding site
Initiation codon
ATG, GTG, TTG (GTG and TTG are usually in bacteria)
Stop codon
TAG, TAA, TGA
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 10 pages and 3 million more documents.

Already have an account? Log in
Reading frames:
o Gene finding to look for regions with no stop codons (initial list then refine)
-> gene finders produce a model where the genes are encoding -> prediction of what
proteins are encoded
Annotation: a oela - Not all ORF’s ae aked as gees
Tools for annotation:
o Gene finders:
eg. GeneMarks, GLIMMER, profigal
What do they do?
o Search all six RF
o Local all ORF (minimum size = 50 codons)
o Identify a potential start codon (ATG most common)
o No overlap rare to have protein coding regions overlap in prokaryotes
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 10 pages and 3 million more documents.

Already have an account? Log in
ORF finder software
o NCBI
o Very crude
o Need to be careful because reader is unbiased, may be selecting regions that aren’t gee
encoding
o Issue: oelappig ORF’s oe tha 2 RF – likely not encoding proteins)
o Not a very good model to find genes
Prediction of protein function Databases:
o Key content: protein sequence + function
eg. Gen bank used as dumping site for every genome produced, may get predictions
about predictions about predictions may enhance some mistakes in our prediction
o Using sequence similarity to predict function
- proteins with the same sequence are likely to have the same/similar function
Most common tool for similarity searching = BLAST
Query sequence = unknown protein
Subject database = database of proteins with known function
Blastp = protein query vs protein database
Blastx = nucleotide query vs protein database (query sequence is translated into
6 peptides, one for each RF)
Description output: all the hits, can get reports, can go to Gen bank to
investigate (also find information about biological experiments done on
sequence)
BLAST: alignment 1st line = query sequence, 3rd line = sequence from database,
line in between are aa that are identical or similar with plus sign (related aa)
Expect = likelihood of match happening by chance (towards 0 = good, >0.1 =
bad)
Predicting protein function:
o < 10% identical = similarity occurs by chance (not related)
o 10-35% identical = might have a related function
o > 35% = probably have a related function
o Groups of proteins that play a particular role are usually located in similar parts of
genome/locus
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 10 pages and 3 million more documents.

Already have an account? Log in

Document Summary

Genome annotation: identification of a novel bacterial virulence. Factor: bacterial genes rarely have introns, getting information out of sequence: what next, locate protein coding regions on the genome sequence, predict the function of proteins. > genome annotation: what is genome annotation, overlay of biological information onto the genome sequence, predicting and marking the position of genes and other elements on a genome sequence i. e. protein coding sequences. Protein coding genes: predict location of genes on genome, translate encoded protein and predict function, predicting protein function. Similarity to characterised proteins (cid:862)h(cid:455)potheti(cid:272)al p(cid:396)otei(cid:374)s(cid:863) = (cid:374)ot similar to any characterised proteins: annotation: gene finding, prokaryotes: simple gene, introns are rare but can be multicistronic (mrna can have multiple proteins encoded, features of protein coding genes: Atg, gtg, ttg (gtg and ttg are usually in bacteria) Tag, taa, tga: reading frames, gene finding to look for regions with no stop codons (initial list then refine)

Get access

Grade+
$40 USD/m
Billed monthly
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
10 Verified Answers
Class+
$30 USD/m
Billed monthly
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
7 Verified Answers

Related Documents