MBBB 301 Exam 2 Guide
1. What is Blast used for? Local alignment tool (compares sequences)
Advantage of blast over global?
- huristic approach or shortcut is blast (local alignment) because it is fast. Global takes longer
because it aligns everything
- Databse search
- What seq similar to it in the database
2. Know when to use the different flavors of blast.
- Blastn is to blast DNA against DNA (to find similar or homologous nucleotide seq)
- plastp is to blast protein against protein (to find homologous protein seq)
- blastx is to blast DNA against the protein (translates 6 reading frames against all nucleotide)
o search nucleotide seq against protein databse if you don’t know what the seq is
o blastx will translate it and tell you what protein it makes
o mostly used for mRNA seq to tell you what protein it makes
Protein seq vs DNA seq: one has 20 characters (amino acids) and the other has 4 (AGCT)
1. Know what an MSA is and why it’s done (i.e. uses of MSAs)
- aligns 3 or more seq. done to be more confident a domain is there
- Also used as a bases of building phylogenetic trees, finding domain, etc.
- MSA is better and more sensitive than pairwise because comparing more so better to find
2. Know the steps in doing a MSA – “understand” it, not memorize it!
Progressive alignment steps (can use t-coffee tool):
- 1. Globally align (1 against all, 2 against all, 3 against all, 4…etc.)
- 2. Building guide tree
- 3. progressively aligning seq
3. Know how many pairwise alignments are generated in an MSA given N sequences.
- Formula: (N-1)(N)/2
- 5 seq ex: (5-1)(5)/2=10
- The larger the number seq, the more global pairwise alignment. It’s accurate but a
disadvantage because becomes very slow 4. Know what “once a gap always a gap” means
- you don’t remove or shift the gap because it shows the most similar sequence and you don’t
want to change the relationship between sequences
- The alignment acts as the cornerstone for the tree building: first step- pairwise alignment,
second- building guide tree using raw (FASTA) seq and applying UPGMA algorithm, third-
aligning seq from guide tree. The purpose of the first two steps is to find the two seq that are
most similar (by identifying the highest scores) and then placing them one the same node
Guide tree and phylogenetic tree are different. Guide tree comes from pairwise scores.
Phylogenetic tree comes from MSA (raw data is filtered out using MSA, MSA is cleaned up
further by deleting gaps to getting conserved data which is used to build a phylogenetic tree).
- Phylogenetic trees use the MSA seq after it already aligned the seq and cleaned up
5. Know the difference between tcoffee and ClustalX
- Tools used to get the progressive alignment
- clustalx is a download tool
- tcoffee is a stand alone web based tool (need internet)
1. Know what phylogenetic is and what questions it can answer
- Study to see how genes are related and where they came from, to see evolutionary
connection, track a virus, etc.
- Can track an outbreak of a disease: take a virus and sequence from many patients to track
who had the longest branch to see who the virus came from
2. Know the nomenclature and understand the topology of a tree. Shape (topology) and length of
tree is what we look at
- We look at shape and length of a tree
- Longer branch= more changes, mutations, etc.
- Topology gives the shape (shows relationship between organisms)
- Node- intersections/starting point of 2 or more ranches
- tips of the branches represent the descendants of a common ancestor. As you move from the
root to the tips, you are moving forward in time. When a lineage splits (speciation), it is
represented as branching
3. Know what a root in a tree means, and why an outer group is used.
- The root of a phylogenetic tree represents the common ancestor of the sequences (some
trees are unrooted and don’t specify the common ancestor)
- A tree can be rooted using an outgroup (a taxon known to be distantly related)
- Rooted tree further up and further I the past
- Unrooted tree only focuses on relationships between organisms without time or a common
ancestor 4. Know the stages of a phylogenetic analysis - the most fundamental step is the MSA, know how
to optimize it and make it the best it can be
1. Selection of FASTA seq (prefer protein) from the same conserved protein
2. Building MSA
3. Plug into phylogenetic tree building tool
4. There is even a fourth step- bootstrapping (the tools we used does not have this)
5. Know what tree building means and how it’s done
- UPGMA algorithm is used. It is an unweighted pair group method using arithmetic mean
- You take the seq, do pairwise alignment, calculate the scores or each, build a tree based on
scores (place similar on same node, cluster what’s similar
6. Know what UPGMA is and how it works, I will NOT ask you to build a tree manually 1 and 2 closer in number of
bases so have a higher
score and are placed on