BCHM 4400 Lecture Notes - Lecture 10: Information Retrieval, Non-Coding Rna, Sequence Database
Document Summary
Genbank: annotated collection of publicly available dna sequences. Contains single pass cdna sequences or expressed sequence tags (ests) from many organisms. Provide nonredundant set of gene transcripts for organisms. All ests and other expressed sequences from an organism are used to create clusters of sequences. Reference sequence database: aim to provide high quality, comprehensive, nonredundant set of sequences. Refseq datasets: used for functional annotation of genome sequencing projects. Uniprot: resource for protein sequence and functional information. Swiss-prot : protein records w manual annotation based on lit and computational results. Pir (protein information resource): produced the protein sequence database. Proteomes: proteome is set of proteins thought to be expressed by organism. Uniprot provides proteome for species w complete sequenced genomes: uniprot supports text search, blast search and ftp download. Uniref clusters and proteomes: uniprot reference clusters (uniref): Uniref100: derived by combining identical sequences and sub fragments. Uniref90: built by clustering sequences w at least 90% identity and 80% overlap.