Count me as skeptical. Since the early 1980s,
I've watched the criteria for molecular homology grow ever weaker.
In 1981, Russell Doolittle noted that "clearly, there must be some
point beyond which a resemblance for two historically related
sequences cannot be verified statistically" (1981, 152). Now,
however, the so-called "twilight zone," beyond which sequence
similarity disappears and any historical signal is lost, seems
to have disappeared altogether. Homology is whatever you can
make a case for.
Just yesterday, for instance, I read the following, from _JMB_;
the comments in brackets are mine:
Proteins are merged into the same homologous family if
they fulfil one of the following conditions: (i) evidence
of significant sequence similarity [the classical criterion],
(ii) evidence of significant structural similarity with a
weaker sequence similarity [moving now into the "twilight
zone," about which Doolittle sensibly worried, because of
the possibility of convergence], or (iii) in the absence of
any sequence similarity, evidence of significant structural
similarity, combined with a functional similarity at a
co-located active site [OK, anything could be homologous].
(Thornton et al. 1999, 335)
The authors go on to admit that "as with sequence analysis, there
is a 'twilight zone' in structure comparisons, where it is uncertain
whether two proteins have arisen through convergent or divergent
evolution" (Thornton et al. 1999, 335), but they provide no criteria
for distinguishing the two possibilities.
What's plain as day is that Thornton et al. are driven by the theory
of common descent in their thinking about molecular similarity. They
note that homology judgments depend critically on one's method, and
also that many sequences resist assignment to known protein families.
Again, the comments in brackets are mine:
Attempts to cluster all proteins into families at the
sequence level give results which are entirely dependent
on the method used and the cut-off applied. [This is
deeply unhealthy, the exact opposite of the method-
independent robustness one would want.] Until recently,
most used BLAST [an alignment algorithm] which is known
to miss 90% of the relations in the PDB [Protein Data
Bank] and all methods leave many sequences unclassified,
which have been termed "singletons." (Thornton et al.
1999, 335)
Now common descent moves center stage to influence the drama:
Are these really proteins which only occur in one species?
This seems unlikely and probably represents our inability
to recognize distant homologues. [Thornton et al. 1999, 335]
Well, maybe not. Maybe singletons aren't homologues at all, and
really do have a phylogenetically restricted distribution. But
that genuine possibility sits uneasily with common descent. Thus,
homology criteria must be stretched to the breaking point, or
jettisoned altogether, to provide ancestry for anomalous proteins.
What makes all this troubling is the incredible shallowness of
our knowledge about protein evolution -- i.e., the very real-time
understanding we need about how protein sequences actually vary
from generation to the next. One would think that homology criteria
should be firmly grounded on experimentally-derived standards for
what is possible in protein variation -- but, alas, the cart of
evolutionary speculation has got so far ahead of the horse of
experimental knowledge, that the horse has fallen asleep under
a tree while the cart has rolled off by itself over the horizon.
Think I'm exaggerating? Here's how Thornton et al. end their
paper (1999, 340):
At this stage, many questions remain unanswered. How
many protein families exist? How did complex pathways
evolve? How do proteins fold? How were the first
structures formed, before conservation of structure
became such a powerful constraint? In all possible
sequence space, what fraction of sequences fold into a
unique native structures [sic]? Were the basic set
of structures evolved before the three kingdoms of
life separated? Are new folds being made today?
Gee, why stop there? Here are some more questions:
-- How much sequence variation is possible before the
functional conformation is lost?
-- Is it possible to arrive convergently at the same
structure via a different sequence? (in which
case any structural similarity would be positively
misleading as to historical homology)
Thornton et al. don't know the answer to either of these
questions, nor do they even seem aware (or only dimly) that
the questions matter centrally to their enterprise of
protein homology assessment.
When the basic concepts of any science grow as battered as
"homology" has now become in evolutionary theory, it's
time to clean house.
______________________________________________________