Class Notes (838,343)
Canada (510,861)
Biology (6,824)
Lecture 19

Biology 2581B Lecture 19: Lec 19 – Bioinformatics
Premium

11 Pages
31 Views
Unlock Document

Department
Biology
Course
Biology 2581B
Professor
David R Smith
Semester
Winter

Description
Lecture 19 – Bioinformatics & Future of Genetics Bioinformatics - Most of you will never perform these procedures - Tests are becoming obsolete - Many of you will use digital device to interpret molecular sequence data o E.g in a hospital, having a child, etc - Health care o Personalized genomics ▪ From designer babies to handheld sequences - Genetic technologies and bioinformatics iercing every aspect of daily life - Healthcare will be hugely influenced by bioinformatics Science and healthcare - Many jobs in science and healthcare o Hospital, clinic, governmental environmental agency with water treatment, bioinformatics, nutrition - Don’t have to be a computer program in the back end, can work in the front end too - Back end and front end o Design, development o Instruction, implementation - Front end: go to high schools and give lectures about bioinformatics, pharmaceutical agents 2 topics - Assembling genomes (task of bioinformatics) - Searching genetic sequences Assembling - Assembling genomes is one of the most important and biggest aspect of bioinformatics - It is easy to generate the raw sequence data o Have technology to fire genome sequencing data really fast o 1000Gb – 100 billion bases of DNA in less than 24 hours - BUT most of the data is in tiny pieces o Some new technology can do longer pieces - Have all this data that is easy to get but how do we put it all together? o Not trivial – very hard and challenging - Sequencing reads - Isolated DNA and sent it to sequencing center - Got DNA data back in pieces of 150 nt - Want to put genome together into a full genome - MUST LOOK FOR OVERLAPS - 25 nucleotides overlap come together and assemble into a contig (because of the overlap) o Represents at 275 nt portion of the genome - Challenge: finding overlap and matching data on the reads and putting together the puzzle - Can say that 25 nt that overlap is small o If you have 25 nt that match between reads – good bet that they belong together - Looking at two reads and at a site in one read and comparing it to a site in another read - Chance of looking at one site, and having the two reads to be identical = 25% chance they match - To get 25 nt that match by chance on two pieces of DNA that don’t belong together is VERY rare = 0.25 25 - These reads should be together and they represent a real portion of the genome - Chance that 1 site is identical = 0.25 - Chance that 2 sites are identical = 0.25 x 0.25 = 0.0625 - Chance that 25 sites are identical = 0.25 = 0.000000000000001 - Issue is not aligning into contigs – the issue is the REPEATS IN GENOMES - Repeats in genomes screws up the sequencing process - Repeats are identical o 4 copies of the identical repeats o 6 copies of the identical repeat in another read - Can put the reads together because they share identical sequence (repeats) - it can fit together in many different ways - Sometimes repeats go on for hundreds or thousands of nucleotides - Need another read in your data set that spans the entire repeat and anchors into each one of the reads o Share identical sequence outside the repeat - Resolve this challenge and form the contig - In many genome, the repeats are so long; you can never get a read that goes through the whole thing o 100bp of repetitive DNA  left not knowing what to do – can’t assemble it - There are sections in genomes where they have not been able to assemble - Today’s bioinformatics programs can assemble millions of sequencing reads o In hours to days to weeks… depending on the algorithm and computer power - Pile in millions of the reads into the computer and it assembles them o Either days, weeks or hours - Assemble a tomato genome, would wait a week for the first round of assembly to go through o If using powerful computer, it would take a couple hours Assembling algorithms - Hugely dependent on the computer infrastructure - More powerful computers = better assemblies - Algorithm tries to find overlap o Have one read and its sequence and search it against 100 billion other reads o Different ways to do it - Take things into account: o How long of an overlap are you willing to accept as a genuine overlap – 4 or 25 nt? o How many mistakes are you willing to accept in the overlap ▪ Sequencing machines make mistakes – not perfect ▪ Will be some errors in the reads ▪ Insertion and deletions as mistakes o Some sophisticated algorithms can navigate through repeat elements o Take into account the quality of the read ▪ As the sequencing reads come off the machine, some are going to look good, others will have a lot of errors ▪ Algorithms are given a quality score - Algorithms use quality scores to see how good their assembly is and how to put it together - Hugely dependent of the computing infrastructure… more powerful computers = better assemblies - Sophisticated algorithms that can assemble genomes – need powerful computer 2 topics - searching - SEARCH – BLAST algorithm - Blastn (nucleotide vs. nucleotide) o Searching nucleotide sequence (RNA or DNA) against a database of known nucleotide sequences - Tblastx (translated nt vs. translated nt) Search the translation of the nucleotide sequence against a data base of translated sequences - Blastp (protein vs. protein) o Search protein against protein - Take unknown sequence and search it against a database of knowns (BLAST IT) - Type of bioinformatics that everyone does on a daily basis - Blastp – search protein against protein BLASTN - Blastn: search nucleotide sequence against database of nucleotide sequences o N for nucleotide TBLASTX (6 frames) - Take unknown nucleotide sequence, and you think it may be protein coding - Instead of searching the nucleotide, translate it into all possible frames (6 frames) o Search each one of the amino acid sequences against a data base where you did the same thing for everything in the data base ▪ Taken every nucleotide sequence in the data base and turned it into the 6 possible amino acid strands - If it was protein coding, do all 6 frames because you do not know where it starts or what strand it is on BLASTP - Blast protei
More Less

Related notes for Biology 2581B

Log In


OR

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


OR

By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.


Submit