Genetics Lecture No. 1: DNA, Molecule Of Heredity & Information
Wednesday January 9 , 2013
Information & DNA:
-Consider a simple game, whereby we have an unknown (x) to be a number between 1 and 100. In this
way, data can only be generated by asking is the number higher or lower than x. In the first round we
find that the number is higher than 10, this yields information as the number now has to be in between
11 and 100. In the second round, we find that the number is higher than 0, which yields no valid
information as we already know that it is greater than 10 (remember that data does not necessarily
equal information). In the third round we find that the number is higher than 5, which like round 2,
yields no valid information. In round four we find that the number is higher than 50, which means that
the number now has to be in between 51 and 100. From this simple game we can adequately define
information as that which reduces uncertainty.
-Consider another scenario which includes four primary devices: Device #1, Device #2, Device #3, and
Device #4. Device # 1, which produces one letter(A), has zero uncertainty and thus conveys no
information (a message is only conveyed when there is some uncertainty present). Device #2, which
produces 2 letters (A, B), has 2 symbols of uncertainty and conveys a message. Device #3, which
produces 4 letters (A, B, C, D), has 4 symbols of uncertainty and conveys a message. Device #4, which
produces one of 2 letters (A, B) and one of 4 numbers (1, 2, 3, 4), has 8 symbols of uncertainty
(combination of Devices #2 and #3) and conveys a message. Increasing complexity allows you to encode
more information in the same amount of information.
Quantitating Information:
-We can calculate uncertainty by using the following formula: Uncertainty = log (M) whe2e M is the # of
possible symbols in a given alphabet. As an example the calculated uncertainty for Device #3 would
equal log 24) = 2 bits. Note that a bit is a unit of information for a log base of 2, likewise for a digit and a
nat (which are for log bases of 10 and e respectively). Using the same example, the amount of
information that Device #3 could generate also depends on the length of the sequence. Therefore the
formula can be modified to: Maximum information content of any sequence = L [log (M)] where 2 is the
length of the sequence. The longer the sequence, the more information it is likely to contain.
-For DNA, we know that there are four possible symbols used (A, C, G, T), which gives us an uncertainty
of log (4) = 2 bits. If we take a gene like insulin (1789 base-pairs in length), we find it to contain 1789
2
[log2(4)] = 3578 bits of information. Note that these calculations assume that each symbol has an equal
chance (0.25) of appearing in the sequence. Remember to use the formula log X = log X b log b as a
calculators use a logarithmic base of 10 instead of 2. E.g. log 5 2 log 5 10log 2 10 The Reason For Using Bits:
-We use bits for quantifying the information contained in DNA because if one were to unambiguously
convert a DNA sequence into a string of 1’s and 0’s (binary code), you would need on average 2
bits/symbol. Remember to use a 2 bits sequence though, as a 1 bit/symbol sequence is ambiguous
(represents more than one possible sequence).
The Experiments Proving DNA As Carrying Biological Information:
-One of the first experiments to prove experimentally the capacity of a DNA sequence to store
information was Fred Griffith’s experiment on two forms of experiment on Streptococcus pneumonia:
smooth colony (wild type) and rough colony (mutant). S bacteria are virulent and can cause lethal
infections when injected into mice. Injections of R mutants by themselves do not cause infections that
kill mice. Similarly, injections of heat-killed S bacteria do not cause lethal infections. Lethal infection
does result, however, from injections of live R bacteria mixed with heat-killed S strains (the blood of the
dead host mouse contains living S-type bacteria). This

More
Less