Topics 1-6: Dr. Bonnie Deroo
Topic 1: The Central Dogma of Biology
The term ―dogma‖ describes a doctrine or code of beliefs accepted as authoritative. The central
dogma of biology refers to the way that genetic information is stored and retrieved in living cells.
The classic relationship is DNA → RNA → Protein. Thus DNA functions as the information
storage molecule, and this information is "read out" into RNA molecules. Some of these RNAs
are intermediates and carry the information used to produce proteins. It is the proteins (and some
RNAs) that are the "active" workers in the cell — catalyzing reactions, moving things around,
creating structures, etc. Thus the information stored in DNA is the genotype (the sum of
inheritable potential) and when this information is translated into RNA and protein, a phenotype
(the sum of observable characteristics) is produced.
The discovery of the structure of DNA
by Watson and Crick in 1953 was a
milestone for biology, leading to a
molecular understanding of how the
sequence of nucleotides making up the DNA molecule encodes information.
Historically, much of our knowledge of reactions occurring in cells has come from isolating and
studying individual types of protein molecules. This resulted in the delineation of various
metabolic pathways, signaling events, structural elements, etc. and eventually to tools for
manipulating DNA itself. You will learn about these later in the course.
These methods have now allowed access to vast stores of genetic information. The Human
Genome Project (begun in 1990, with a working draft completed 10 years later) led to the
development of fast and accurate DNA sequence determination techniques. Over the past 20
years a huge quantity of sequence information has been generated. In 1996, scientists completed
the total nucleotide sequence of DNA from yeast: about 12 million base pairs of DNA
representing over 6000 genes were identified. Since then, the chromosomal DNA of many
microbes has been sequenced (now over 400 organisms). A virtually complete sequence of
human DNA was completed in 2002. The human genome consists of approximately 3.2 billion
base pairs and encodes approximately 25,000 genes. However, we do not yet know the function
of many of these genes. The power to manipulate DNA sequences gives us new ways to probe
these functions and to answer questions about how cells work.
Since proteins are the active molecules in the cell and some of the key reagents in the technology
behind molecular biology, we will begin the course with a discussion of their general properties.
Proteins are made from building blocks called amino acids, which are strung together in long
polymer chains. The chains fold and coil in three dimensions to achieve a structure with a
biological function. Understanding this critical process requires an in-depth discussion of the
various forces that stabilize a protein into a given conformation (shape).
Note: Chapter 2 of the textbook, "Essential Cell Biology" (3 edition) by Alberts et al., entitled
―Chemical Components of Cells‖ provides a review of material covered in OAC Chemistry,
Chem 1050 and Bio 1222. This chapter deals with atoms, electron shells, chemical bonding
(ionic, covalent, polar covalent, H-bonding, etc.), major chemical components of cells, water,
weak acid and bases, amino acids, etc. You are expected to understand this basic chemistry
because it is important to protein structure, which will be discussed in the next few lectures. Proteins are macromolecules with molecular weights ranging from about 5 kilodaltons (kDa) to
several thousand kDa. A simple cell such as yeast contains about 6,000 different proteins. Many
of these proteins are biological catalysts (enzymes), which catalyze a single chemical reaction in
the cell but others serve a range of functions, as you will see (see Panel 4-1, p. 120). In fact,
proteins are the most diverse class of macromolecules, with a huge range of sizes, shapes, copy
number, solubility, etc. as well as function (Fig. 4-9, p. 127), but underlying this complexity is a
very simple fundamental structure. All proteins are synthesized from combinations of some or all
20 amino acids. Basically, proteins are linear polymers of amino acids. To understand protein
structure, we have to start with the amino acids.
Topic 2: Amino Acids
Readings: p. 72-73
More properly known as alpha-amino acids, their general structure is:
Nineteen of the twenty amino acids have the same arrangement around the
central alpha-carbon: a. an amino group, b. a carboxyl group, c. a hydrogen,
and d. an R group (called the "side chain") which differs for each amino acid.
Recall that when 4 different groups are attached to a carbon atom,
stereoisomers are possible. Therefore, amino acids are designated as D- and
L-amino acids. Not all the amino acids have D- and L- isomers, but for those
that do, only the L- forms are incorporated into proteins.
An important property of amino acids is their net charge, derived from the ionization of weakly
acidic or basic groups. The net charge on a group changes as the hydrogen ion concentration
(pH) changes because of the association of hydrogen ions with the groups.
Recall that for a weak acid:
RCOOH <=> H + RCOO
This equilibrium is characterized by a constant Ka for each
This equilibrium shows that lowering the pH (in-reasing H ) will drive the equilibrium to the
left, as written, resulting in decreased RCOO and increased RCOOH. In other words, the
fraction of the molecules that are ionized will decrease. Thus the net charge on the group will
decrease. For basic groups such as amino groups, the effect is the opposite. That is, the fraction
of the molecules that are ionized increases with decreasing pH.
As we will see, proteins have multiple ionizable groups, so their net charge depends on the sum
of the charges from all groups.
Categories of Amino Acids
Every amino acid has a 3-letter abbreviation and a one-letter code. I don’t expect you to
memorize all the codes, but you will have to memorize a few (see below). Amino acids are
classified according to the properties of their side chains.
1. The largest group has non-polar side chains:
a. Some have only H or CH in s3de chains: glycine (gly, G), alanine (ala, A), valine (val, V),
leucine (leu, L) and isoleucine (ile, I)
b. Some contain a sulfur atom: cysteine (cys, C), methionine (met, M) c. Two are aromatic: phenylalanine (phe, F), tryptophan (trp, W)
d. Finally, the one odd one is actually an imino acid meaning that its immediate synthetic
precursor was an imino acid (ie. it contained an imine, or C=NH group): proline (pro, P)
2. Charged side chains: basic or acidic
a. contain carboxyl groups: glutamic acid (glu, E), aspartic acid (asp, D)
b. contain basic groups: lysine (lys, K), arginine (arg, R), histidine (his, H)
3. Uncharged polar side chains:
a. contain hydroxyl groups: Serine (ser, S), threonine (thr, T), tyrosine (tyr, Y)
b. contain amide groups: glutamine (gln, Q), asparagine (asn, N)
Note: You will be expected to know the structure of the following 8 amino acids: glycine (G),
alanine (A), cysteine (C), serine (S), proline (P), lysine (K), aspartic acid (D), phenylalanine (F).
While only these 20 amino acids are used to make proteins, other amino acids can be found in
proteins due to modifications that happen after the protein is made. This allows the introduction
of specialized groups for specific purposes, and often changes the properties of the protein. A
common example is phosphorylation of the hydroxyl-containing amino acids ser, thr and tyr.
These phosphoamino acids have a phosphate esterified on the hydroxyl group of their side chain.
You will come across a variety of other modifications as you study Biochemistry.
Topic 2 Review Questions
In the following questions mark the one best answer.
2-1. Which of these amino acids contains a sulfur atom?
a) S b) C c) K d) D e) P
2-2. Which one of these statements about amino acids is true?
a) twenty-two amino acids are commonly found in proteins
b) most of the amino acids found in proteins have charged side chains
c) not all amino acids have stereoisoforms
d) both D- and L- amino acids are found in proteins
e) polar amino acids are considered hydrophobic
In the next 4 questions, match the property with the one most appropriate amino acid
shown in the list. An amino acid may be used more than
once or not at all.
2-3. The smallest of the four amino acids listed b) lysine
c) aspartic acid
2-4. Fits in the nonpolar class d) proline
2-5. Can form disulfide bonds e) none of the above
2-6. Has an aromatic group
Topic 3: Protein Structure
Readings: p. 121-140
Proteins are the largest and most varied class of biological molecules, and they show the greatest
variety of structures. Many have intricate three-dimensional folding patterns that result in a
compact form, but others do not fold up at all ("natively unstructured proteins") and exist in
random conformations. The function of proteins depends on their structure, and defining the
structure of individual proteins is a large part of modern Biochemistry and Molecular Biology. To understand how proteins fold, we will start with the basics of structure, and progress through
to structures of increasing complexity.
To make a protein, amino acids are connected together by a type of amide bond called a "peptide
bond". This bond is formed between the alpha amino group of one amino acid and the carboxyl
group of another in a condensation reaction. When two amino acids join, the result is called a
dipeptide, three gives a tripeptide, etc. Multiple amino acids result in a polypeptide (often
shortened to "peptide"). Because water is lost in the course of creating the peptide bond,
individual amino acids are referred to as "amino acid residues" once they are incorporated.
Another property of peptides is polarity: the two ends are different. One end has a free amino
group (called the "N-terminal") and the other has a free carboxyl group ("C-terminal").
In the natural course of making a protein, polypeptides are elongated by the addition of amino
acids to the C-terminal end of the growing chain. Conventionally, peptides are written N-
terminal first; therefore gly-ser is not the same as ser-gly or GS is not the same as SG. The
connection gives rise to a repeating pattern of "NCC-NCC-NCC…" atoms along the length of
the molecule. This is referred to as the "backbone" of the peptide. If stretched out, the side chains
of the individual residues project outwards from this backbone.
The peptide bond is written as a single bond, but it actually has
some characteristics of a double bond because of the resonance
between the C-O and C-N bonds:
This means that the six atoms involved are coplanar, and that
there is not free rotation around the C–N axis. This constrains the
flexibility of the chain and prevents some folding patterns.
Primary Structure of Proteins
It is convenient to discuss protein structure in terms of four levels (primary to quaternary) of
increasing complexity. Primary structure is simply the sequence of residues making up the
protein. Thus primary structure involves only the covalent bonds linking residues together.
The minimum size of a protein is defined as about 50 residues; smaller chains are referred to
simply as peptides. So the primary structure of a small protein would consist of a sequence of 50
or so residues. Even such small proteins contain hundreds of atoms and have molecular weights
of over 5000 Daltons (Da). There is no theoretical maximum size, but the largest protein so far
discovered has about 30,000 residues. Since the average molecular weight of a residue is about
110 Da, that single chain has a molecular weight of over 3 million Daltons.
This level of structure describes the local folding pattern of the
polypeptide backbone and is stabilized by hydrogen bonds
between N-H and C=O groups. Various types of secondary
structure have been discovered, but by far the most common are
the orderly repeating forms known as the helix and the
An helix, as the name implies, is a helical arrangement of a
single polypeptide chain, like a coiled spring (see Fig. 4-10, p.
130). In this conformation, the carbonyl and N-H groups are oriented parallel to the axis. Each carbonyl is linked by a hydrogen bond to the N-H of a residue
located 4 residues further on in the sequence within the same chain. All C=O and N-H groups are
involved in hydrogen bonds, making a fairly rigid cylinder. The alpha helix has precise
dimensions: 3.6 residues per turn, 0.54 nm per turn. The side chains project outward and contact
any solvent, producing a structure something like a bottle brush or a round hair brush. An
example of a protein with many helical structures is the keratin that makes up human hair.
The structure of a sheet is very different from the structure of an helix. In a sheet, the
polypeptide chain folds back on itself so that polypeptide strands lie side by side, and are held
together by hydrogen bonds (see Fig. 4-10, p. 130), forming a very rigid structure. Again, the
polypeptide N-H and C=O groups form hydrogen bonds to stabilize the structure, but unlike the
helix, these bonds are formed between neighbouring polypeptide () strands. Generally the
primary structure folds back on itself in either a parallel or antiparallel arrangement, producing a
parallel or antiparallel sheet (see Fig. 4-14, p. 132). In this arrangement, side chains project
alternately upward and downward from the sheet (Fig. 4-10D, p. 130). The major constituent of
silk (silk fibroin) consists mainly of layers of sheet stacked on top of each another.
Other types of secondary structure. While the helix and sheet are by far the most common
types of structure, many others are possible. These include various loops, helices and irregular
conformations. A single polypeptide chain may have different regions that take on different
secondary structures. In fact, many proteins have a mixture of helices, sheets, and other
types of folding patterns to form various overall shapes (Fig. 4-16, p. 133).
What determines whether a particular part of a sequence will fold into one or the other of these
structures? A major determinant is the interactions between side chains of the residues in the
polypeptide. Several factors come into play: steric hindrance between nearby large side chains,
charge repulsion between nearby similarly-charged side chains, and the presence of proline.
Proline contains a ring that constrains bond angles so that it will not fit exactly into an helix or
sheet. Further, there is no H on one peptide bond when proline is present, so a hydrogen bond
cannot form. Another major factor is the presence of other chemical groups that interact with
each other. This contributes to the next level of protein structure, the tertiary structure.
This level of structure describes how regions of secondary
structure fold together - that is, the 3D arrangement of a
polypeptide chain, including helices, sheets, and any other
loops and folds. Tertiary structure results from interactions
between side chains, or between side chains and the
polypeptide backbone, which are often distant in sequence.
Every protein has a particular pattern of folding and these can
be quite complex (e.g. Panel 4-2, p. 128, right).
Whereas secondary structure is stabilized by H-bonding, all four ―weak‖ forces contribute to
tertiary structure (p. 122). Usually, the most important force is hydrophobic interaction (or
hydrophobic bonds). Polypeptide chains generally contain both hydrophobic and hydrophilic
residues. Much like detergent micelles, proteins are most stable when their hydrophobic parts are
buried, while hydrophilic parts are on the surface, exposed to water. Thus, more hydrophobic
residues such as trp are often surrounded by other parts of the protein, excluding water, while
charged residues such as asp are more often on the surface (Fig. 4-5, p. 124). Other forces that contribute to tertiary structure are ionic bonds between side chains, hydrogen
bonds, and van der Waals forces (Fig. 4-4, p. 123). These bonds are far weaker than covalent
bonds, and it takes multiple interactions to stabilize a structure.
There is one covalent bond that is also involved in tertiary structure, and that is the disulfide
bond that can form between cysteine residues. This bond is important only in non-cytoplasmic
proteins since there are enzyme systems present in the cytoplasm to remove disulfide bonds.
Visualization of protein structures Because the 3D structures of proteins involve thousands of
atoms in complex arrangements, various ways of depicting them so they are understood visually
have been developed, each emphasizing a different property of the protein. Panel 4-2 (p. 128)
illustrates a few of these different ways, from a simple backbone to a space-filling representation.
Software tools have been written to depict proteins in many different ways, and have become
essential to understanding protein structure and function.
Structural Domains of Proteins
Protein structure can also be described by a level of organization that is distinct from the ones we
have just discussed. This organizational unit is the protein ―domain,‖ and the concept of domains
is extremely important for understanding tertiary structure. A domain is a distinct region
(sequence of amino acids) of a protein, while a structural domain is an independently-folded part
of a protein that folds into a stable structure. A protein may have many domains, or consist only
of a single domain. Larger proteins generally consist of connected structural domains. Domains
are often separated by a loosely folded region and may create clefts between them. Structural
domains are often functional units as well. Examples of structural domains are illustrated in Fig.
4-16, p. 133 and 4-17, p134.
Some proteins are composed of more than one polypeptide chain. In such proteins, quaternary
structure refers to the number and arrangement of the individual polypeptide chains. Each
polypeptide is referred to as a subunit of the protein. The same forces and bonds that create
tertiary structure also hold subunits together in a stable complex to form the complete protein.
Individual chains may be identical, somewhat similar, or totally different. As examples, CAP
protein (Fig. 4-19, p. 136) is a dimer with two identical subunits, whereas hemoglobin (Fig. 4-20,
p. 136) is a tetramer containing two pairs of non-identical (but similar) subunits. It has 2
subunits and 2 subunits. Secreted proteins often have subunits that are held together by
disulfide bonds. Examples include tetrameric antibody molecules that commonly have two larger
subunits and two smaller subunits (―heavy chains‖ and ―light chains‖) connected by disulfide
bonds and noncovalent forces (Panel 4-3, p. 144, top left).
In some proteins, intertwined helices hold subunits together; these are called coiled-coils (Fig.
4-13, p. 132). This structure is stabilized by a hydrophobic surface on each helix that is created
by a heptameric repeat pattern of hydrophilic/hydrophobic residues. The sequence of the protein
can be represented as ―abcdefgabcdefgabcdefg...‖ with positions ―a‖ and ―d‖ filled with
hydrophobic residues such as A, V, L etc. Each helix has a hydrophobic surface that therefore
matches the other. When the two helices coil around each other, those surfaces come together,
burying the hydrophobic side chains and forming a stable structure. An example of such a
protein is myosin, the motor protein found in muscle that allows contraction. Protein Folding
How and why do proteins naturally form secondary, tertiary and quaternary structures? This
question is a very active area of research and is certainly not completely understood. A folded,
biologically-active protein is considered to be in its ―native‖ state, which is generally thought to
be the conformation with least free energy.
Proteins can be unfolded or ―denatured‖ by treatment with solvents that disrupt weak bonds.
Thus organic solvents that disrupt hydrophobic interactions, high concentrations of urea or
guanidine that interfere with H-bonding, extreme pH or even high temperatures, will all cause
proteins to unfold. Denatured proteins have a random, flexible conformation and usually lack
biological activity. Because of exposed hydrophobic groups, they often aggregate and
precipitate. This is what happens when you fry an egg.
If the denaturing condition is removed, some proteins will re-fold and regain activity. This
process is called ―renaturation.‖ Therefore, all the information necessary for folding is present in
the primary structure (sequence) of the protein. During renaturation, the polypeptide chain is
thought to fold up into a loose globule by hydrophobic effects, after which small regions of
secondary structure form into especially favorable sequences. These sequences then interact with
each other to stabilize intermediate structures before the final conformation is attained.
Many proteins have great difficulty renaturing, and proteins that assist other proteins to fold are
called ―molecular chaperones.‖ They are thought to act by reversibly masking exposed
hydrophobic regions to prevent aggregation during the multi-step folding process. Proteins that
must cross membranes (eg. mitochondrial proteins) must stay unfolded until they reach their
destination, and molecular chaperones may protect and assist during this process.
Protein families/Types of proteins
Proteins are classified in a number of ways, according to structure, function, location and/or
properties. For example, many proteins combine tightly with other substances such as
carbohydrates (―glycoproteins‖), lipids (―lipoproteins‖), or metal ions (―metalloproteins‖). The
diversity of proteins that form from the 20 amino acids is greatly increased by associations such
as these. Proteins that are tightly bound to membranes are called ―membrane proteins‖. Proteins
with similar activities are given functional classifications. For example, proteins that break down
other proteins are called proteases.
Because almost all proteins arise by an evolutionary process, ie. new ones are derived from old
ones, they can be classified into families by their relatedness. Proteins that derive from the same
ancestor are called ―homologous proteins‖. Studying the sequences of homologous proteins can
give clues to the structure and function of the protein. Residues that are critical for function do
not change on an evolutionary timescale; they are referred to as ―conserved residues‖.
Identifying such residues by comparing amino acid sequences often helps clarify what a protein
is doing or how it is folded. For example the proteases trypsin and chymotrypsin are members of
the ―serine protease‖ family; so-named because of a conserved serine residue that is essential to
catalyze the reaction. Trypsin and chymotrypsin contain very similar folding patterns and
reaction mechanisms. Recognizing a pattern of conserved residues in protein sequences often
allows scientists to deduce the function of a protein.
Topic 3 Review Questions
In the following questions, mark the one best answer. 3.1 In a folded protein, most of the nonpolar amino acids are buried inside the protein fold,
while the polar and charged side chains are exposed to the components in the cytosol. This fold
is more stable because of the expulsion of nonpolar atoms from contact with water, favouring the
interaction of nonpolar atoms with each other. What is this type of non-covalent interaction
between nonpolar atoms called?
a) Apolar interaction
b) Hydrophilic interaction
c) Hydrophobic interaction
d) Hydrocarbon interaction
3-2. Which of the following statements about an -helix is false?
a) side chains project outwards
b) 3.6 residues/turn
c) forms a rod or cylindrical shape
d) stabilized by ionic bonds
e) has a tightly packed, hydrophobic core
3-3 Which of the following statements about tertiary structure is false?
a) it involves interactions between amino acid side chains
b) charged side chains are mainly on the exterior
c) it involves multiple polypeptides
d) it is disrupted during denaturation
e) it is sometimes stabilized by disulfide bridges
3-4 The concept of domains is very important in understanding protein structures.
Which of the following statements about domains is false?
a) they are often functional units
b) they are part of quaternary structure
c) they are usually tightly folded
d) they are often separated from each other by clefts
e) they are often connected by flexible regions
3-5 Which of the following statements is true?
a) Peptide bonds are the only covalent bonds that can link together two amino
acids in proteins.
b) The polypeptide backbone of some proteins is branched.