13. Skeptical Conclusions

 

  



Alfred North Whitehead in his 1911 Introduction to Mathematics observed,

It is a profoundly erroneous truism, repeated by all copy-books and by eminent people when they are making speeches, that we should cultivate the habit of thinking of what we are doing. The precise opposite is the case. Civilization advances by extending the number of important operations which we can perform without thinking about them. . . .

From a slightly different perspective, this is precisely a formula for the stagnation of civilization, for someone at some time must do the hard thinking upon which such important operations are built, especially if civilization is to advance by the overthrow of some thoughtless dogma or mindless ritual.

Nathan Divinsky in The Chess Encyclopedia [D] calls the Elo System "a mathematically sound and universally accepted (1970) rating system for chess players."  The year refers to the adoption of the Elo System by FIDE.  Aside from a 1965 contribution by Elo to The Journal of Gerontology, there has been virtually no peer review of the system beyond the world of organized chess. One of the few external references is to be found in The Mathematics of Games, by J. D. Beasely [B].  Beasely offers this scathing footnote on the work of the late Professor Elo:

His statistical testing is unsatisfactory to the point of being meaningless; he calculates standard deviations without allowing for draws, he does not always appear to allow for the extent to which his test results have contributed to the ratings which they purport to be testing, and he fails to make the important distinction between proving a proposition true and merely failing to prove it false.

Beasely nevertheless accepts the premise of the Percentage Expectancy Curve.  His basic concern is the difficulty of demonstrating any suitable probability function for that role.  Again the essential  incoherence of Elo's position seems to have escaped notice.

The proof of the pudding, it has been said, is the actual operation of a rating system, and the Elo System has been grinding out chess ratings for over four decades now with hardly a grumble from the rating pool.  One is tempted to say that the system works despite its theory rather than because of it.  The reputation of the Elo System, on the other hand, rests largely on its supposed ability to predict chess outcomes.  There is even the occasional inquiry as to whether the system can predict outcomes in sports such as basketball, football, golf and soccer.  As this treatise has attempted to show, the predictive powers of the Elo System are not due to its application of probability theory, which in the final analysis must be characterized as a misapplication, but rather to principles of averaging which have hardly been articulated elsewhere.

Probability theory, as it happens, does explain much of the success of the Elo System, but theory of a different sort than its author took for granted.  If Elo misapplied theory, he also made considerable use of the mathematical intuition which in other contexts he disparaged.  The result is a system that is a marked improvement over those that preceded it, but a system that falls short of the scientific rigor that Elo envisioned for it.  If there is any lesson to be learned from his celebrated work, it is that no single system is likely to satisfy the requirements of statistical precision.  Rating systems in the past, as Elo notes, "received acceptance because they produced ranking lists which agreed generally with the personal estimates of rankings made by knowledgeable chess players" [E2].  Even now popular taste may have a role in deciding which system is to be sanctioned by organized chess and how it is to be administered.       

The principles of rating theory undoubtedly have applications beyond chess. As Elo said of his own system, it is "applicable to any type of competitive activity in which individuals or teams engage in pairwise competition" [E1, preface].  To this may be added applications for noncompetitive pairwise comparisons, such as opinion sampling for marketing research. One would hope that the current controversy is resolved before such wholesale applications.  For some, however, the allure of rating theory lies in the controversy itself.  It is a controversy that has not yet been played out in organized chess and a cautionary tale for all involved.