Lecture 19

Biology 2581B Lecture 19: Lecture 19 genetics 2581

Western University
Biology 2581B
David R Smith

Lecture 19  Personalized genomics in healthcare are going to become hugely influenced by bioinformatics  Really easy to get the sequence read, but we really only get small pieces  We have all this data but how do we put it together  You start looking for overlaps to put things together  25 nucleotides that match - a pretty good bet that they belong together  4n  But now we have repeats in there  You can put them together since they share identical repeats Sequencing Reads  So what do you need? o You need another read that spans that whole repeat and then anchors one of the original o But in many genomes the repeats are so long, that you wouldn't get o Which is why you have sections that you just haven't been able to assemble  To put them together, the computer looks for overlaps  As the sequencing reads come off the machine, some are good but some are bad  The key is to find sequences that span the repeat  Algorithms used to evaluate how good they are  BLAST - the database that you use to figure out what you're looking at o Against a database of known DNA  BLASTN - comparing nucleotides  TBLASTX - take unknown sequence o Translate all 6 frames and then search those against the data base of the same thing  Aka you take every nucleotide run in the database and convert it to the 6 frames  BLASTP - protein vs protein (amino acid sequences)  You have an unknown sequence and we are going to tBlastX it since there’s a greater chance that it is protein coding  So you're looking at hits from the database and it hits the COX1 genes  The coverage is good; meaning our search query is covered almost completely by the hits  The type of hits look consistent - they're all the same thing and the score is high o When you blast you're given scores like E values, etc o How similar is your unknown to the hits?  If the types of hits are all over the place and not consistent, then it doesn’t make sense o If the score is low, then the percent identity between the unknown and the hits isn't very good  But maybe it was never a protein coding sequence so we shouldn't use the translation of it to search - use BLASTN o You can find consistent type of hits  Then you
