The Big Bang of the Protein Universe

This post was chosen as an Editor's Selection for ResearchBlogging.org

You might not like it, but you are directly related to the bacteria in your kitchen sink and the grass that’s slowly growing in your garden. However new research suggests that even the most ancient parts we share with all life on earth are still drifting apart, with no sign of stopping…

In 1929 Edwin Hubble discovered a positive correlation between the speed at which galaxies move away from the Milky Way and their distance from us. This observation would become one of the first pieces of evidence for the Big Bang theory: if all galaxies move away from each other, we can trace back the paths they’ve taken and converge upon a single point where our known universe must have originated.

Recently, Inna Povolotskaya and Fyodor Kondrashov applied some Big Bang theory on the sequence space of proteins, instead of the physical space of the universe. The closest thing biologists have to a Big Bang is LUCA, the last universal common ancestor of life. As LUCA evolved into different species, proteins that were once one and the same acquired different mutations in different species. In other words, the sequences slowly drifted away from each other, like galaxies have done ever since the Big Bang.

But how far does the Big Bang analogy stretch? Will the sequences of orthologous proteins forever drift apart, like galaxies do? Or are there limits to the divergence? After all, biologists assume sequence space (unlike physical space) is limited: at any given time there are only a few mutations possible, without disrupting the function of the protein.

Big Bangs of physical and protein sequence space. Are the protein sequences still diverging (top)? Or has the entire protein sequence space been explored, having reached the outer limits of maximal divergence (bottom)? Figure adapted from reference.

Hubble had to use the redshift and the Doppler effect to estimate the speed by which galaxies moved away from earth. Likewise, Povolotskay and Kondrashov had to find a way to obtain the divergence rates of proteins. Whereas Earth was the only viewpoint available to Hubble, protein divergence rates can be calculated from the perspective of every species. By comparing the number of mutations among closely related sequences that lead away (Na) or towards (N t) a reference sequence, they were able to obtain a crude estimate on how quickly proteins are drifting apart.

IfNt/Na < 1, the protein sequences are drifting away from the reference sequence. On the other hand, if Nt/Na > 1 the sister sequences are evolving towards the reference. The scientists applied this method on 572 orthologous proteins which are predicted to have been present in LUCA. The exciting result is that even the most distant proteins showed Nt/Na rates that were lower than 1, meaning that even the most ancient proteins are still diverging from each other!

Epistatic fitness landscape. Some mutations confer higher or equal fitness (green) while others provide lower fitness (black). Such a landscape is considered rugged, since the order by which mutations can occur is limited. Figure adapted from reference.

The authors concluded that evolution is happening pretty slowly, since not all sequence space seems to have been covered yet. They show that epistasis is likely responsible for this slow rate of evolution. Epistasis has everything to do with so-called ‘rugged fitness landscapes’. The figure above shows such a rugged epistatic landscape, where only a few mutations are tolerated at a given time. While AT and GC are both viable options for a particular position in a gene, you cannot go there by simply mutating the A into a G and the T into C. The number of paths to GC is limited, so proteins can meander a long time on the rugged landscape before they reach maximal divergence.

This is not only true for the mutations in a certain position, it also works for the protein as a whole. At any given time, only a few amino acid changes are tolerated (Povolotskay and Kondrashov estimate that at any given time only two percent of amino acids can be substituted). But as soon as a single mutation happens, other substitutions become possible while other possibilites are closed off. It’s as if the mountain ranges keep moving!Given enough time, more than 90% of the sites in ancient proteins eventually accept a mutation.

The rugged landscape of Flores, Indonesia. Going from one peak to the other is only possible via certain paths. Picture by jpslim.

The implications of the ongoing divergence of ancient proteins are pretty deep. Since it is entirely likely that some proteins have drifted away beyond our recognition, LUCA might have had much more proteins than the 572 we can now detect. And if we fast forward a billion years later, some proteins which we now consider to be homologous might no longer be recognized as such.

When I first heard that the all galaxies around us are moving away faster and faster, it made me feel pretty small and lonely. The knowledge that life itself is also ever diverging doesn’t make me feel that way however. After all, we’re still in it together on that small blue planet in that small slice of heaven.


Povolotskaya, I., & Kondrashov, F. (2010). Sequence space and the ongoing expansion of the protein universe Nature DOI: 10.1038/nature09105


You might also like:

    3 Songs for traveling outer space
    A Changing Universe
    Where did all the tyrosine go?

8 comments to The Big Bang of the Protein Universe

  • Interestingly enough, not the first time the Big Bang metaphor was used to describe protein evolution:

    http://www.pnas.org/content/99/22/14132.abstract

    • That’s pretty cool, thanks! What I can quickly glance from the paper, they propose that all proteins may have evolved from a few/a single ancestor protein(s).. Combined with the observation that proteins are still diverging, the Big Bang analogy seems to be really appropriate!

  • Wow, really interesting. I’d never heard divergence from LUCA compared to the big band before, it’s a great analogy.

    And I’m pretty sure mountain ranges are still moving either away or towards each other, depending on the continental drift. Just very, very slowly. And probably in slightly more predictable directions as well!

    (I am really looking forward to getting back into blogging…)

    • Thanks for pointing it out, that honestly slipped my mind! How stupid.. The analogy becomes all the better though, as fitness landscape appear to change on similar timescales as the mountains that rise and drop!

      Good to have you back in the blogosphere!

  • Great post. Very interesting. Great analogy!!!!

  • [...] like the universe, the genomes of life on earth continue to drift apart, forever masking deep relationships and suggesting a wildly divergent deep [...]

  • A question about your conclusion “meaning that even the most ancient proteins are still diverging from each other!”
    When comparing the two circles (big bang of life) of your first illustration, I see in the top-right diagram 5 species happily moving away from E coli with no boundaries in sight, while the bottom-right diagram all 5 species hit the boundary of sequence space. If you look carefully you see the blue arrows changed direction, I suppose because of hitting the boundary. So, is it correct to conclude that “even the most ancient proteins are still diverging from each other”? Especially, Salmonella and Cyanosarcina are moving back to E. coli. A boundary is a boundary, isn’t? If the length of line from E. coli to the blue dot is a measure of distance, then certainly Cyanosarcina has reaced its maxiumum length and is moving back towards E. coli.
    Furthermore, in the diagram the inner circle of the top and bottom diagram are exactly the same (unlike the big bang horizon which is expanding ) so Cyanosarcina simply has to go back. And it does, as the blue arrow shows.

    A second remark: I found it highly interesting to read that the fitness effect of a substition is not fixed but depends on the sequence as a whole and so changes during evolution.

    • Thanks for your comment and careful scrutiny of the first figure! The two diagrams represent the two hypothetical different outcomes the authors expected: either the sequences are still diverging (top), or the sequences have reached maximal divergence and are (necessarily, as you rightly noticed) moving towards each other again. This is indeed represented by the blue arrows ‘bouncing off’ the wall. The major difference between the Big Bang and protein evolution, is that the boundaries are considered to be fixed in the latter situation.

      However, I wouldn’t be surprised if the hypothetical boundaries of protein space can be considered endless for all practical reasons. Given an extra billion year, it is likely that we don’t recognize the orthology in the most ancient proteins. The boundary might be there, but it might be so far away that we’ll never detect it or proteins will never reach it.

      It’s a bit like songs and poetry: the number of songs that can be written is theoretically limited. But still songwriters string notes, chords and words together into new songs year after year, even if common patterns are repeated. Likewise, I think we’ll never have exhausted the pool of all possible songs. There’s always a route or alley that hasn’t been taken before.

      On your last point, I consider epistatis to be one of the most fascinating observations in evolutionary research! I wrote more in this topic in the context of the evolution of fluorescence in corals here.

You must be logged in to post a comment.

Subscribe without commenting