Some topics in Molecular Biology

Formation of peptide bonds

There are plenty of Web sites that introduce messnger RNA, ribosomes and other components of the protein biosynthetic apparatus but they sometimes skate over the basic chemistry.

Peptide bonds are formed by an enzyme calatalysed reaction between aminoacyl-tRNA and polypeptidyl-tRNA. Each of the 21 amino amino acids incorporated into nascent polyppeptid chains has its own transfer RNA ("tRNA") molecules (typically a small family of these). tRNAs are typically of 70-80 nucleotides in length and contain several modified nucleosides.
First we need to attach an amino acid to an appropripate (cognate) tRNA. As an example, threonine is attached by an ester link to the 3'-end of a its tRNA by a reaction that in summary is:

Thr + ATP + tRNAThr → Thr-tRNAThr + AMP + PPi
The reaction takes place in two stages, both catalysed by the same enzyme, aminoacyl-tRNA synthetase (threonyl tRNA synthetase in this case). The two stages (picture below) are ① formation of an aminoadenylate intermediate and ② the formation of aminoacyl-tRNA.
aa-tRNA synthetase
So now we can imagine a population of aminoacyl-tRNA molecules and a growing peptide chain with the "most recent tRNA to have been added". The fundamemtal reaction is one we have
already seen:
ester + amine → amide + alcohol

chemical model
The protein biosythetic equivalent is:
polypeptidyl-tRNA + aminoacyl-tRNA → polypeptide+1-tRNA + tRNA
peptide bond formation
The enzyme peptidyl transferase is a ribozyme, formed from RNA domains in a ribnucleoprotein particle, the ribosome. The appropriate aminoacyl-tRNA molecules are slected by interaction of a sequence of nucleotides, the anticodon, in tRNA with the corresponding codon in mRNA.
GOTO TOP

RNA and ribosomes

J H Parish Undergraduate Lecture on RNA structure, 2000; updated 2009)

Secondary structure elements

A type helix: usually intramolecular helix with loops etc.
The helix is intramolecular in most cases. The exceptions are certain double stranded viral RNAs (most mycophage, i.e. "fungal viruses" are of this type).
Look at the 3 structures below.
        				   A A
(1)  5'....A C A A G A C A C G A C A C G A     G
     3'....U G U U C U G U G U U G U G C U     A
        				   G A

                                           A A
(2)  5'....A C A A G A C A C G A C A C G A     G
     3'....U G U U C U G U G U U G U G C U     A
                 A A                       G A
                C   C
                  A

                   G A                         A A
(3)  5'....A C A A     G A C A C G A C A C G A     G
     3'....U G U U     C U G U G U U G U G C U     A
                  A   A                        G A
                  C   C
                    A

(1) is a stable helix even though it contains a G:U base pair: G:C and A:U stabilise the helix; G:U has no stabilising or destabilising effect.
(2) and (3) are both less stable but (2) is better than (3) because in (2) the "upper strand" still has base stacking interactions (between an A and a G).

Pseudoknots

These are not really "tertiary structure" but "folds". The principle is simple: imagine that an RNA molecule can form 2 stem-loops but there is complementarity between bases in the LOOPS. Then a second piece of helix can form.

There is a good review of RNA folds by D E Draper (1996) TIBS 21, 145-149. In fact the pseuknot illustrated there is a bit complicated and the general principles are clearer with a the molecule GGCGCAGUGGGCUAGCGCCACUCAAAAGGCCCAU: complete and outline structures of this.  
TOP

Transfer RNA

This figure is a generalised cartoon of the cloverleaf secondary structure of tRNA:

                                A 3'(OH)
                                C
                                C
                                73
                      (P)5' 1 . 72  acceptor
                            2 . 71  stem
                            3 . 70
                            4 . 69
                            5 . 68  T--stem-----loop
                            6 . 67
      D--loop---stem        7 . 66                C 59  
                           U      65 64 63 62 61       A
         15 A             9        .  .  .  .  .       
       16     13 12 11 10                               57
      17       .  .  .  .         49 40 51 52 53      C
       G                                          T Ψ
       G      22 23 24 25          48
         20 A            26           47
                           27 . 43 44    46
                           28 . 42    45   variable loop
            anticodon      29 . 41
               stem        30 . 40
                           31 . 39
                          32     38
            anticodon      U     37
               loop         34  36
                              37

The Ψ symbol is capital Greek Psi for pseudouridine.
tRNA has not only such a secondary structure but a characteristic fold illustrated below in 2 alternative conventions for yeast tRNAAsp.
TOP.

Ribosomal RNA

Large ("23S or 28S") and small ("16S or 18S") subunit rRNAs are relatively slowly evolving macromolecules. They occur in all forms of life and fulfil the same functions in all. They are therefore good molecules for molecular taxonomy. They have similar secondary structures and these can more-or-less be mapped on to one another. Thus if we take as an example the small subunit (SSU) rRNAs from three very divergent "organisms" a very small SSU rRNA would be represented by the human mitochondrian, a smallish one would be from E. coli and a big one would be from human cytoplasm. The differences are extra domains in the larger molecules. There is now a database of rRNAs (some 8600 in the case of the SSU rRNA). There are several sites that specialise in organising these data e.g.
http://arb-silva.de. rRNA phylogeny is valuable as it is possible to span all organisms thanks to pioneering work by C.R. Woese.
See also ribosomes

TOP

Ribozymes

There are 3 types of ribozyme, one of which requires a Guo co-factor. Their natural roles are in self splicing (hammerhead ribozymes from plant viruses and group I introns [Guo co-factor]) and in the processing of other RNA precursors, e.g. RNase P involved in processing bacterial pre-tRNA etc. Any recent vaguely structural molecular biology book is a good source of pictures on ribozymes but the authors do not all put the ribozymes together. For ribozymes a
RiboaptDB is a searchable structure database. A hammerhead is shown here for illustration:

TOP; or scroll down ... ribosomes contain a ribozyme.

Ribosomes

The most important structural work on nucleic acids recently has been the eluicidation of the structure of ribosomes to high high resolution. The most important single result has been the recognition of the nature of the catalytic centre of peptidyl transferase, i.e. the enzyme that catalyses the reaction:
aa-tRNA1 + pp-tRNA2 = pp-aa-tRNA1 + tRNA2
It now transpires that the enzyme is actually a ribozyme.
There are several ribosome structures in the NDB (nucleic acid database) for at Rutgers (USA). For the small and large subunits follow the links to rr0001 and rr0002 respectively.

More-or-less a complete issue of Science2000 289: 878... is devoted to recent developments. Here is the text of the summary by T R Cech:
The amino acids we obtain by digestion of steak, salmon, or a lettuce salad are loaded onto transfer RNAs (tRNAs) and rebuilt into proteins in the ribosome, the cell's macromolecular protein-synthesis factory. The bacterial ribosome is composed of three RNA molecules and more than 50 proteins. Its key components are so highly conserved among all of Earth's species that a similar entity must have fueled protein synthesis in the common ancestor of all extant life. Although the chemical reaction catalyzed by the ribosome is simple--the joining of amino acids through amide (peptide) linkages--it performs the remarkable task of choosing the amino acids to be added to the growing polypeptide chain by reading successive messenger RNA (mRNA) codons. On page 905 of this issue, Steitz, Moore, and colleagues now provide the first atomic-resolution view of the larger of the two subunits of the ribosome. From this structure they deduce on page 920 that RNA components of the large subunit accomplish the key peptidyl transferase reaction. Thus, ribosomal RNA (rRNA) does not exist as a framework to organize catalytic proteins. Instead, the proteins are the structural units and they help to organize key ribozyme (catalytic RNA) elements, an idea long championed by Harry Noller, Carl Woese, and others.
These landmark publications are but the latest chapter in a progression of ribosome structural studies that have spanned four decades. Early electron micrographs of ribosomes in action led to immunoelectron microscopy and ultimately to cryo-electron microscopy images of about 20 Å resolution . Proteins were also located within the ribosome by neutron scattering. However, to achieve atomic resolution, x-ray crystallography is required, a daunting task given the huge size (2.6 x 106 daltons) and asymmetry of the ribosome. The pioneering crystallization of ribosomes from the bacterium Haloarcula marismortui in the 1980s by Ada Yonath and H. G. Wittmann provided the first rays of hope, but it is only in the past few years that crystal structures have been determined for the large subunit (5 Å resolution) (3e small subunit (5.5 Å resolution) , and the whole ribosome complexed with tRNAs (7.8 Å resolution)
Now, at 2.4 Å, almost the entire chain of the 23S rRNA and its tiny 5S rRNA partner, totaling 3043 nucleotides, have been fitted into the electron density map of the H. marismortui large ribosomal subunit. The RNA secondary structure (intramolecular base-pairing pattern) of the large-subunit rRNA had been determined previously , and is present as predicted in the x-ray structure. In addition, a large number of unpredicted RNA tertiary structure interactions are now seen. Overall, the RNA forms a huge single mass of tightly packed helices, not six discrete domains connected by floppy linkers as a naïve observer might predict from looking at the secondary structure diagram.
Where, then, are all of the proteins, and what is their function? The globular domains of 26 proteins are found largely on the exterior of the subunit (see the figure). Twelve of these proteins have unusual snake-like extensions, devoid of tertiary structure and in some cases even secondary structure, and an additional protein is entirely extended; their shapes are molded by their interactions with the RNA. From these pictures, and from what is known about protein cofactors that facilitate the action of some other ribozymes, it is likely that these ribosomal proteins buttress, stabilize, and orient the otherwise floppy RNA into a specific, active structure.

The original contains references and links removed here. Here is the picture from Cech's article:

.... and the original text.....
A ribosome's true colors. (Top) The large subunit of the ribosome seen from the viewpoint of the small subunit, with proteins in purple, 23S rRNA in orange and white, 5S rRNA (at the top) in burgundy and white, and A-site tRNA (green) and P-site tRNA (red) docked according to. (Bottom) The peptidyl transfer mechanism catalyzed by RNA. The general base (adenine 2451 in Escherichia coli 23S rRNA) is rendered unusually basic by its environment within the folded structure; it could abstract the proton at any of several steps, one of which is shown here.

Compararable detail for the smaller 30S subunit can be found in the 21st September 2000 issue of Nature The editorial reads:
The ribosome is the protein factory of the cell, translating the genetic information encoded in messenger RNA into proteins. In bacteria, the ribosome is made up of a large and a small subunit known as the 50S and 30S subunits respectively.
The structure of the small, 30S, ribosomal subunit of the bacteria Thermus thermophilus is revealed at a resolution of 3 Å in the journal Nature. In a second article, functional insights gained from this structure are described, and the structure of the 30S subunit interacting with antibiotics, which can interfere with ribosomal decoding and translocation, is presented.
The ribosome has been intensively studied for the last 40 years and, as James Williamson says in an accompanying News and Views article, the structure, "finally provides a face to players that have long been known by their functions."

One set of views of the 30S subunit and its legend are below.

Figure 2. Overview of the 30S structure.
a, Secondary structure diagram of 16S RNA (modified with permission from http://www.rna.icmb.utexas.edu/CSl/2STR/Schematics/e.coli16s.27.5.5.schem.ps; see also ref. 21), showing the definition of the various helical elements used throughout the text. The numbering and diagram correspond to the E. coli sequence. Red, 5' domain; green, central domain; orange, 3' major domain; cyan, 3' minor domain.
b, Stereo view of the tertiary structure of 16S RNA from our refined model, showing the 50S or 'front' view, with the same colouring for the domains. H, head; Be, beak; N, neck; P, platform; Sh, shoulder; Sp, spur; Bo, body.
c, d, Front (50S) and back sides of the 30S. Grey, RNA; blue, proteins.


Structure and function animation from MRC Cambridge including excellent animations of the phases of protein biosynthesis.

TOP (of this section).

GOTO TOP

Genetic Code

The phrase "genetic code" has been misused to mean a genome (for example). Rather it is the relationship of the sequence of mRNA (or the + or non-transcribing strand) of DNA. The code is a non-overlapping triplet (like a language with only 3-letter words). After the general features of the code were established, the code itself was established in the 1960s by groups led by H.G. Khorans and Marshall Nirenberg. They shared a Nobel prize with Robert Holley who was the first to sequence a tRNA molecule and identify an anticodon. The general nature of the Genetic code is that it is a 3D array which cannot easily be represented on a flat sheet of paper (or computer screen) but is easily represented as such in computer software (this speeds up the process of "looking up the code"):

  
The single letter versions of amino acide abbreviations are fairly straightforward, e.g A = Ala[nine] but there are some exceptions to avoid confusion:
F PHenylalanine; R aRginine, Y tYrosine; and the following to learn without mnemonics:
W tryptophan; D & E aspartate and glutamate; N & Q asparagine and glutamine; U Sel selenocysteine.
There are abbreviations for other "minor" or "ambiguous" amino acids but U (selenocysteine) is the "21st amino acid" (incorporated into nascent proteins using its own tRNA).
U               C                   AG
U F
F
L
L
S
S
S
S
Y
Y
Q
in some ciliates
Q
in some ciliates
C
C
U W
(mycoplasmas, mitos...)
W
U
C
A
G
C T in yeast mitos.
L  ditto
ditto
(below also)
 S
in C. cyl.
P
P
P
P
H
H
Q
Q
R
R
R
R
U
C
A
G
AI
I
M
in several mitos.
M
T
T
T
T
N
N
K
K
S
S
✳,G,S
in mitos.
✳,G,S
in mitos.
U
C
A
G
GV
V
V
V
A
A
A
A
D
D
E
E
G
G
G
G
U
C
A
G
Abreviations used: mitos. mitochondria; C. cyl. Candida cylindrica; ✳ chain terminating codon.
A more recent catalogue of alternative/exceptional cases of codon meaning can be found at http://prowl.rockefeller.edu/aainfo/gencode.html
Two "interesting" questions (sorry, dear reader, think about these and then look up the answer elsewhere): in cases other than mitochondria and mycoplasmas how can a ribosome "know" whether to terminate at UGA or stick in a selenocystein residue? How does this compare with the problem of deciding whether an AUG is N-terminal or an internal methionine (rarely isoleucine)?

DNA repair

This lecture was given to the EPSRC CYTOCOM group in 2000.

It is the biological background to a possible computing metaphor. In general if macromolcules become damaged they are destroyed and re-synthesised. There are four notable exceptions: (1) aminoacyl-tRNA (if an amino acid is joined to a non-cognate tRNA), (2) the 3'-terminus of tRNA (CCA) which becomes a bit tatty with time, (3) the bacterial cell wall and (4) DNA.

DNA background.
Here is in 1 convention for drawing a DNA double-stranded molecule. The reason the ends are called 5'- and 3'- is unimportant (just chemistry) but the fact that they represent directions IS important: ACAT is different from TACA just as dog is different from god.
5'-...A C A T A G C A T C C A T A G T A C...-3'
3'-...T G T A T C G T A G G T A T C A T G...-5'
The relationship is that the 2 strands are antiparallel (look at the 5' and 3') and the pairing rules are A:T and C:G.

Several things can go wrong: during DNA replication the 2 strands come apart and the new strands are synthesised 5'-to-3': the enzyme is called DNA polymerase so here are some steps where the lower strand above is acting the template. However I've put hyphens in to emphasise that the residues are all joined up into long chain molecules.
3'-...T-G-T-A-T-C-G-T-A-G-G-T-A-T-C-A-T-G...-5'
5'-...A-C-A-T

The next letter (or 'nucleotide' as we biochemists say) is a A...
3'-...T-G-T-A-T-C-G-T-A-G-G-T-A-T-C-A-T-G...-5'
5'-...A-C-A-T-A
but now the wrong letter is inserted.....
3'-...T-G-T-A-T-C-G-T-A-G-G-T-A-T-C-A-T-G...-5'
5'-...A-C-A-T-A-A
DNA polymerase looks at the last 'base pair' it has made and spots it doesn't fit so the next 2 stages are:
3'-...T-G-T-A-T-C-G-T-A-G-G-T-A-T-C-A-T-G...-5'
5'-...A-C-A-T-A A
and....
3'-...T-G-T-A-T-C-G-T-A-G-G-T-A-T-C-A-T-G...-5'
5'-...A-C-A-T-A
so now we can try again and choose (hopefully) a G. The important point here is that error-checking and correction are parts of the activity of the polymerase enzyme. There is another biochemical example of this kind of thing but we'll leave it at that.

Watch out you've just had a somatic mutation: As a result of an enzyme-catalysed reaction, C residues are coverted to U's; we chemists refer to this an example of a deamination but the important thing is that U behaves like T:
3'-...T-G-T-A-T-C-G-T-A...-5'
5'-...A-C-A-T-A-G-C-A-T...-3'
becomes...
3'-...T-G-T-A-T-U-G-T-A...-5'
5'-...A-C-A-T-A-G-C-A-T...-3'
so there is a danger of a mutation... if the upper strand is the template, the lower one becomes...
5'-...A-C-A-T-A-A-C-A-T...-3'
DON'T WORRY ABOUT IT! There is an enzyme constantly scanning your DNA and marking the U's by putting a cut in the strand which is them repaired by a method similar to that of the following section but before we look at that remember DNA does not look like two strings of characters: it is a lumpy molecule and the repair enzymes actually detect irregularities in a pseudo-regular structure.
space filling model of DNA

Now here's a different problem. As a result of a chemical attack or UV or ioinising radiation or simply spontaneously, the DNA becomes damaged.... I've taken the hyphens away to give myself a bit more room. x following a letter = damaged base.
5'-...A C A T A GxC A T C C A T A G T A C...-3'
3'-...T G T A T C G T A G G T A T C A T G...-5'
There are several pathways for dealing with this but in essense they consist of 2 steps and there are 2 kinds of alternative.
1 recognition of the damage and 2 doing something about it.
1: It is the shape of the bad base pair that is recognised (Gx:C in this case).
2: One route involves simply removing or repairing the damage in situ so an enzyme whips off the "x". More usually a cut is made by (actually 1 of 2 methods) but for simplicity let's go for the "corrignedase" mechanism. A corrigendase scans DNA for such bad shapes and would do something like this... I've stuck the hyphens back in and the x is now over the G.
                x
5'-...A-C-A-T-A-G-C-A-T-C-C-A-T-A-G-T-A-C...-3'
3'-...T-G-T-A-T-C-G-T-A-G-G-T-A-T-C-A-T-G...-5'
The corrigendase puts a cut* in the region of the damage*.
*In case you talk to a real biochemist (s)he will point out that there are usually 2 cuts not quite as close to the damage as I've shown.
5'-...A-C-A-T-A-g     T-C-C-A-T-A-G-T-A-C...-3'
3'-...T-G-T-A-T-C-G-T-A-G-G-T-A-T-C-A-T-G...-5'

Now there is another enzyme that is called a "repair polymeraase" that concurrently erodes and re-syntheisises the damaged strand.
Using little letters to represent repaired strand....

5'-...A-C-A-T-A-g     T-C-C-A-T-A-G-T-A-C...-3'
3'-...T-G-T-A-T-C-G-T-A-G-G-T-A-T-C-A-T-G...-5'

5'-...A-C-A-T-A-g-c-a     C-A-T-A-G-T-A-C...-3'
3'-...T-G-T-A-T-C-G-T-A-G-G-T-A-T-C-A-T-G...-5'
Until the repair polymerase falls off (stochastic process... typically the repaired bit is around 200 bases not the few shown here) and we end up with:
5'-...A-C-A-T-A-g-c-a-r-c-c-a-t-a G-T-A-C...-3'
3'-...T-G-T-A-T-C-G-T-A-G-G-T-A-T-C-A-T-G...-5'
The last job is to join the a to the G a-G.
An interesting feature of this system is that the enzymes "choose" which strand to regard as the template and which to regard as damaged. Remember it was the shape of Gx:C that was spotted as bad, how do we say, 'ah well it is "obvious" that we must use the lower strand as the template in this case?'. Well the answer is that in reality there are more than 4 bases in DNA, there are 2 more "called" (well here at least) Am and Cm so our original molecule (no hyphens) might actually be:
5'-...A C A T A G C A T C C A T A G T AmC...-3'
3'-...T G T AmT C G T A G G T AmT C AmT G...-5'
I've suggested that there are more m's (think of m standing for modified) in the lower strand. This means that this is the older strand because only A, C, T, G are incorporated by DNA polymerase and the m's are bodged in later. So the system has guessed that "our damaged G" is in the less modified (i.e. "younger") strand. This is correct because it is during phase of DNA replication that DNA is most sensitive to chemical etc. damage.


Last revision 08oct15: Copyright ©
GOTO TOP