Homology
Source -
On universal common ancestry, sequence similarity, and phylogenetic structure: the sins of P-values and the virtues of Bayesian evidence
On universal common ancestry - sequence similarity and phylogenetic structure
Universal common ancestry: The qualitative evidence and need for a formal test
Universal common ancestry is the hypothesis that all extant terrestrial life shares a common genetic heritage. The classic arguments for common ancestry include many independent, converging lines of evidence from various fields, including biogeography, palaeontology, comparative morphology, developmental biology, and molecular biology. The great majority of this evidence, however, is qualitative in nature and only directly addresses the relationships of limited sets of higher taxa, such as the common ancestry of metazoans or the common ancestry of plants.
The broader question of universal common ancestry is much more ambitious and correspondingly difficult to assess. ...
Are Europeans, Euryarchaeota, Euglena, Yersinia, yew, and yeast all genetically related? Of course, biologists routinely incorporate all of these taxa into a universal phylogenetic tree, which is an explicit representation of the genealogical relationships among these diverse taxa. But any group of taxa can be connected in a tree; one can even make a phylogenetic tree from random sequences or characters. Yet is a tree itself justifiable in light of the evidence? In a paper that motivated my original test of common ancestry, Sober and Steel set out the issue very clearly:
When biologists attempt to reconstruct the phylogenetic relationships that link a set of species, they usually assume that the taxa under study are genealogically related. Whether one uses cladistic parsimony, distance measures, or maximum likelihood methods, the typical question is which tree is the best one, not whether there is a tree in the first place.
This is the question I set out to answer: Is there a universal tree — or, more broadly, a universal pattern of genetic relatedness — in the first place?
Several researchers have recently questioned the nature and status of the theory of UCA or have emphasized the difficulties in testing a theory of such broad scope. For example, Ford Doolittle has disputed whether objective evidence for UCA, as described by a universal tree, is possible even in principle:
Indeed, one is hard pressed to find some theory-free body of evidence that such a single universal pattern relating all life forms exists independently of our habit of thinking that it should.
This sentiment was echoed also by K&W, who concluded that a "formal demonstration of UCA … remains elusive and might not be feasible in principle.". Such criticisms of UCA point to a need for a formal test, similar to the formal tests of fundamental physical theories like general relativity and quantum mechanics.
Darwin originally proposed UCA in 1859, yet was characteristically circumspect, only committing to the view that "animals are descended from at most only four or five progenitors, and plants from an equal or lesser number". The hypothesis of UCA was evidently an open question at least until the mid 1960's, when a debate about UCA and the universality of the genetic code (then as yet undeciphered) played out in the pages of Science. One of the most celebrated arguments for UCA is based on the fact that the genetic code is identical, or nearly so, in all known life. The argument had been circling informally for some years before Hinegardner and Engelberg first presented it in detail:
Because the genetic code should remain invariant, its constancy can be used to establish the number of primordial ancestors from which all (present) organisms are derived. If, for example, the code is universal … then all existing organisms would be descendants of a single organism or species. If the code is not universal, the number of different codes should represent the number of different primordial ancestors …
Hinegardner and Engelberg's reasoning hinges on the assumption that the genetic code is so important for fundamental genetic processes that any mutations in the code would be lethal. Carl Woese criticized this argument, noting its dependence on the assumption that the genetic code is a "historical accident" and must not be "chemically determined". Woese was a proponent of the "stereochemical hypothesis", which holds that the association between a certain codon and its respective amino acid is dictated by chemical phenomena — that is, the observed code is required somehow by the laws of physics, perhaps by binding affinity of the nucleic acid codon to its corresponding amino acid. Woese was also sceptical that the code was "frozen", and he postulated plausible mechanisms by which a degenerate code could evolve. If the code were somehow determined by physicochemical principles and evolvable, then multiple origins of life could conceivably converge independently on the same code. However, the stereochemical hypothesis was considered and largely disregarded by most researchers, including Francis Crick, due to a lack of evidence and difficulty in imagining a possible mechanism.
In 1968, Crick still presented the "frozen accident" argument for UCA with some reservation. But by 1973, in his famous essay on the explanatory power of evolutionary theory, Theodosius Dobzhansky laid out the existing evidence for UCA as if it were beyond dispute. According to Dobzhansky, the primary support for UCA is given by several key molecular similarities shared by all known life: (1) the "universal" genetic code, (2) nucleic acid as the genetic material, (3) shared polymers such as proteins, RNAs, lipids, and carbohydrates, and (4) core metabolism. These are today still the main arguments for UCA.
The standard presentation of this evidence is, however, strictly qualitative; it does not quantitatively assess the likelihood that these commonalities could be arrived at independently from multiple origins. Each of Dobzhansky's arguments for UCA has its weaknesses, and Sober and Steel provide several criticisms of these standard arguments [11]. While a detailed analysis of these lines of evidence for UCA is beyond the scope of this article, as a case study let us briefly revisit the "universal" genetic code, widely considered the most persuasive evidence for UCA
Sequence similarity and homology are not equivalent
One common thread among the various arguments for common ancestry is the inference from certain biological similarities to homology. However, with apologies to Fisher, similarity is not homology. It is widely assumed that strong sequence similarity indicates genetic kinship. Nonetheless, as I and many others have argued , sequence similarity is strictly an empirical observation; homology, on the other hand, is a hypothesis intended to explain the similarity. Common ancestry is only one possible mechanism that results in similarity between sequences. In a landmark paper on the inference of homology from sequence similarity, the late Walter Fitch presented the problem as follows:
Now two proteins may appear similar because they descend with divergence from a common ancestral gene (i.e., are homologous in a time-honoured meaning dating back at the least to Darwin's Origin of Species) or because they descend with convergence from separate ancestral genes (i.e., are analogous). It is nevertheless possible that the restrictions imposed by a functional fitness may cause sufficient convergence to produce an apparent genetic relatedness. Therefore, the demonstration that two present-day sequences are significantly similar, by either chemical or genetic criteria, still must necessarily leave undecided the question whether their similarity is the result of a convergent process or all that remains from a divergent process. For example, it is at least philosophically possible to argue that fungal cytochromes c are not truly homologous to the metazoan cytochromes c, i.e., they just look homologous.
Colin Patterson made a similar argument, explicitly pointing out that statistically significant sequence similarity does not necessarily force the conclusion of homology:
… given that homologies are hypothetical, how do we test them? How do we decide that an observed similarity is a valid inference of common ancestry? If similarity must be discriminated from homology, its assessment (statistically significant or not, for example) is not necessarily synonymous with testing a hypothesis of homology.
How, then, would we know if highly similar biological sequences had independent origins or not? In all but the most trivial cases we do not have direct, independent evidence for homology — rather, we conventionally infer the answer based on some qualitative argument, often involving sequence similarity as a premise.
@shunyadragon where is the objective verifiable evidence? We are still waiting.