EHRC Homepage | New Category | Your Questions

Paul Nelson Writes:

Count me as skeptical. Since the early 1980s, I've watched the criteria for molecular homology grow ever weaker. In 1981, Russell Doolittle noted that "clearly, there must be some point beyond which a resemblance for two historically related sequences cannot be verified statistically" (1981, 152). Now, however, the so-called "twilight zone," beyond which sequence similarity disappears and any historical signal is lost, seems to have disappeared altogether. Homology is whatever you can make a case for.

Just yesterday, for instance, I read the following, from _JMB_; the comments in brackets are mine:

Proteins are merged into the same homologous family if they fulfil one of the following conditions: (i) evidence of significant sequence similarity [the classical criterion], (ii) evidence of significant structural similarity with a weaker sequence similarity [moving now into the "twilight zone," about which Doolittle sensibly worried, because of the possibility of convergence], or (iii) in the absence of any sequence similarity, evidence of significant structural similarity, combined with a functional similarity at a co-located active site [OK, anything could be homologous]. (Thornton et al. 1999, 335)

The authors go on to admit that "as with sequence analysis, there is a 'twilight zone' in structure comparisons, where it is uncertain whether two proteins have arisen through convergent or divergent evolution" (Thornton et al. 1999, 335), but they provide no criteria for distinguishing the two possibilities.

What's plain as day is that Thornton et al. are driven by the theory of common descent in their thinking about molecular similarity. They note that homology judgments depend critically on one's method, and also that many sequences resist assignment to known protein families. Again, the comments in brackets are mine:

Attempts to cluster all proteins into families at the sequence level give results which are entirely dependent on the method used and the cut-off applied. [This is deeply unhealthy, the exact opposite of the method- independent robustness one would want.] Until recently, most used BLAST [an alignment algorithm] which is known to miss 90% of the relations in the PDB [Protein Data Bank] and all methods leave many sequences unclassified, which have been termed "singletons." (Thornton et al. 1999, 335)

Now common descent moves center stage to influence the drama:

Are these really proteins which only occur in one species? This seems unlikely and probably represents our inability to recognize distant homologues. [Thornton et al. 1999, 335]

Well, maybe not. Maybe singletons aren't homologues at all, and really do have a phylogenetically restricted distribution. But that genuine possibility sits uneasily with common descent. Thus, homology criteria must be stretched to the breaking point, or jettisoned altogether, to provide ancestry for anomalous proteins.

What makes all this troubling is the incredible shallowness of our knowledge about protein evolution -- i.e., the very real-time understanding we need about how protein sequences actually vary from generation to the next. One would think that homology criteria should be firmly grounded on experimentally-derived standards for what is possible in protein variation -- but, alas, the cart of evolutionary speculation has got so far ahead of the horse of experimental knowledge, that the horse has fallen asleep under a tree while the cart has rolled off by itself over the horizon.

Think I'm exaggerating? Here's how Thornton et al. end their paper (1999, 340):

At this stage, many questions remain unanswered. How many protein families exist? How did complex pathways evolve? How do proteins fold? How were the first structures formed, before conservation of structure became such a powerful constraint? In all possible sequence space, what fraction of sequences fold into a unique native structures [sic]? Were the basic set of structures evolved before the three kingdoms of life separated? Are new folds being made today?

Gee, why stop there? Here are some more questions:

-- How much sequence variation is possible before the functional conformation is lost?

-- Is it possible to arrive convergently at the same structure via a different sequence? (in which case any structural similarity would be positively misleading as to historical homology)

Thornton et al. don't know the answer to either of these questions, nor do they even seem aware (or only dimly) that the questions matter centrally to their enterprise of protein homology assessment.

When the basic concepts of any science grow as battered as "homology" has now become in evolutionary theory, it's time to clean house.

______________________________________________________ Ó 2010 Arthur V. Chadwick, Ph.D.