Evolving a code: A molecular fossil's tale

This post was chosen as an Editor's Selection for ResearchBlogging.org
Every living cell on earth carries a molecular fossil: the ribosome. In a recent paper published in PNAS, researchers from California open the drawer and dust off this ancient molecular machine. The structure of the ribosome seems to provide hints about the origin of that universal feature of life: the genetic code.

The genetic code is life’s universal language. Some species might have different dialects, but we all share the same grammar. The genetic code specifies which triplets of DNA or RNA nucleotides get translated into which amino acids. Its universality is the reason why we can take a human insulin gene, put it into bacteria and let them produce fully functional insulin hormones.

This is of course good news for diabetic patients, but not so much for scientists trying to understand how the genetic code came to be. Since all life on earth seems to descend from life that had a genetic code much like ours, there are no ancestral genetic codes around to compare it to. Scientists already cracked the genetic code in the sixties (and deservedly won a Nobel prize for doing it), but the origins of our genetic code have remained unclear for the past 50 years. Why is the genetic code the way it is? Why does GAA code for a glutamic acid and not for aspartic acid?

The genetic code. Triplets of RNA correspond with different amino acids. CAU codes for a histidine, while CAA codes for a glutamine. Source.

One hypothesis states that the chemical afinity between codons and amino acids formed the basis of their functional coupling. If this is true, the genetic code is like a wedding vow, formalizing and reinforcing the existing bonds between amino acid and triplet (in holy matrimony). Biochemical experiments seem to support this idea, but David Johnson and Lei Wang took the analysis to the inside of the cell, where a molecular fossil was waiting to tell its story.

That fossil is an ancient molecular machine that is at least as old as the genetic code: the ribosome. These protein assembly sites are complexes that consist of numerous proteins interacting with RNA molecules. Ribosomes are well-studied, but nobody so far looked what the ribosome could tell us about the genetic code, using the interactions between between nucleotides and amino acids.

The 3-Dimensional structure of a ribosomal subunit. In yellow you can see the rRNA, the other colours denote proteins. A single subunit will contain a lot of RNA-amino acid interactions. Source.

Luckily, the structure of the ribosomes of several species are known (those scientists won a Nobel prize last year). So Johnson and Wang could look within these structures to see if the codons and their amino acids preferably associated with each other (within a radius of 5Å). As an example, they looked whether the trinucleotides CAU and CAC are overrepresented in the vicinity of histidine.

They found a significant overrepresentation for 11 anticodons and 8 codons and their amino acids. However, by generating a million random codes and choosing the optimal sets of amino acids to analyze, they found that the amino acid-codon enrichment was only better than 54% of the random codes, whereas the amino acid-anticodon interactions did prove to be correlated with the genetic code.

Four amino acids, when expanded to the entire block, improved correlation with their codons. Figure from reference.

So far so good, but Johnson and Wang decided to dig deeper still. As you can see in the genetic code at the top, some amino acids share a block, differing only in their third position of the triplet. Other blocks are exclusively occupied by a single amino acid. Could it be that long ago, some amino acids fought their way into mono-blocks, driving the original occupants away?

The researchers again decided to use their random codes to investigate this possibility. When the amino acids Leu, Ile, His and Lys were (hypothetically) expanded to span the entire codon block, the number random codes that are better than the canonical one dropped by 40%! This indicates that these amino acids may have had bigger blocks for themselves in the past, until some of the codons were captured by other amino acids. Amongst other things, they predict for the first time that Phe has captured the UUY codons from Leu.

Putative chronology of amino acids. Only the later amino acids are correlated with their anticodons, showing a possible two-stage transition for the evolution of the genetic code. Table from reference.

By mapping their anticodon-amino acid enriched pairs onto a putative chronology of amino acids in the genetic code, they suggest that the genetic code might have evolved in two separate stages. The first stage, which includes the amino acids found in the famous Urey-Miller experiment, seems independent of anticodon interactions. Perhaps later, after translation evolved, the interactions allowed new amino acids to become part of the genetic canon.

Understanding the key events in the early evolution of life is only possible if we can somehow infer ancestral states that existed billions of years ago. Treating the ribosome as a molecular fossil is a wonderful idea and it’s a perfect example of how structural biology can contribute to our understanding of evolution. Molecular paleontology.. I like the sound of that!

Molecular paleontologists should wear hats too!


Johnson, D., & Wang, L. (2010). Imprints of the genetic code in the ribosome Proceedings of the National Academy of Sciences DOI: 10.1073/pnas.1000704107


You might also like:

    The algae’s accent
    MolBio Picks of the Week
    Resurrecting ancient apples and proteins

4 comments to Evolving a code: A molecular fossil’s tale

You must be logged in to post a comment.

Subscribe without commenting