Many proteins have a quaternary structure, which consists of several polypeptide chains that
associate into an oligomeric molecule. Each polypeptide chain in such a protein is called a
subunit. Hemoglobin, for example, consists of two α and two β subunits. Each of the four chains
has an all-α globin fold with a heme pocket.Domain swapping is a mechanism for forming
oligomeric assemblies. In domain swapping, a secondary or tertiary element of a monomeric
protein is replaced by the same element of another protein. Domain swapping can range from
secondary structure elements to whole structural domains. It also represents a model of evolution
for functional adaptation by oligomerisation, e.g. oligomeric enzymes that have their active site
at subunit interfaces.
Nature is a tinkerer and not an inventor, new sequences are adapted from pre-existing sequences
rather than invented. Domains are the common material used by nature to generate new
sequences, they can be thought of as genetically mobile units, referred to as 'modules'. Often, the
C and N termini of domains are close together in space, allowing them to easily be "slotted into"
parent structures during the process of evolution. Many domain families are found in all three
forms of life, Archaea, Bacteria and Eukarya. Domains that are repeatedly found in diverse
proteins are often referred to as modules, examples can be found among extracellular proteins
associated with clotting, fibrinolysis, complement, the extracellular matrix, cell surface adhesion
molecules and cytokine receptors.
Molecular evolution gives rise to families of related proteins with similar sequence and structure.
However, sequence similarities can be extremely low between proteins that share the same
structure. Protein structures may be similar because proteins have diverged from a common
ancestor. Alternatively, some folds may be more favored than others as they represent stable
arrangements of secondary structures and some proteins may converge towards these folds over
the course of evolution . There are currently about 45,000 experimentally determined protein 3D
structures deposited within the Protein Data Bank (PDB). However this set contains a lot of
identical or very similar structures. All proteins should be classified to structural families to
understand their evolutionary relationships. Structural comparisons are best achieved at the
domain level. For this reason many algorithms have been developed to automatically assign
domains in proteins with known 3D structure, see 'Domain definition from structural co-
The CATH domain database classifies domains into approximately 800 fold families, ten of
these folds are highly populated and are referred to as 'super-folds'. Super-folds are defined as
folds for which there are at least three structures without significant sequence similarity. The
most populated is the α/β-barrel super-fold as described previously.
majority of genomic proteins, two-thirds in unicellular organisms and more than 80% in
metazoa, are multidomain proteins created as a result of gene duplication events. Many domains
in multidomain structures could have once existed as independent proteins. More and more
domains in eukaryotic multidomain proteins can be found as independent proteins in
prokaryotes. For example, vertebrates have a multi-enzyme polypeptide containing the GAR
synthetase, AIR synthetase and GAR transformylase modules (GARs-AIRs-GARt; GAR:
glycinamide ribonucleotide synthetase/transferase; AIR: aminoimidazole ribonu