CMMB 461 Lecture Notes - Lecture 22: Position Weight Matrix, Consensus Sequence, Gc-Content
Document Summary
Transcription factor binding sites (tfbs) are usually degenerate and derived from a frequency matrix (c) Have multiple genes with a common motif. Find motifs in promoter of genes that represent potential. First position: acgt, 4 of them are c frequency matrix. Picture: out of 8 sequences, four are c in the first position, size of letter is the number of counts. After this, get a consensus sequence, if a base occurs more than 1/2 the sites, it is a frequent base (c happens 5 times in the same position. Sequence logos (d, e): stack the letters to get a bettie measure of base conservation at each position of the tfbs. Bit: the amount of information required to distinguish between two equally likely choices; four bases = 2 bits. Conserved bases are less than two bits after small sample correction (e) Bit is the amount of info required to distinguish bases.