Back to Basics in Rating Theory

1. Beyond the Elo System


Basics

Probability in Rating Systems
Ordinal Ratings
Interval Ratings
Ratio Ratings

Applications
Sequential vs. Simultaneous
Consistency
Established Ratings

Attenuation
Percentage Expectancy
A Revised Elo System
Cumulative Averaging
The Berkin System
Progressive Ratings


Conclusions
Tests
Skeptical Conclusions


Author
Downloads

PDF



The first attempt at a mathematical rating system for chess has been credited to the Correspondence Chess League of America in 1939.  With a world war intervening, the influential Ingo System of West Germany followed in 1948, named by its originator Anton Hoesslinger (1875-1959) for his home town, Ingolstadt in Bavaria [HW, "rating"].  It establishes the basics of ratings with a simple formula,

[1.1]                    R  =  ERc - (Pct - 50) ,

where ERc is the arithmetic average of the opposition ratings and Pct is the player's score in percentage points.  (A peculiarity here, from the standpoint of subsequent systems, is that lower ratings represent greater playing strength.) The Ingo formula represents a notable advance in the rating of chess players, though it appears to have sprung primarily from Hoesslinger's intuition.  The formal development of rating theory began about 1960 with the introduction of probability formulas.  The originator of this idea was Arpad E. Elo (1903-1992), one of the founders of the United States Chess Federation (USCF), whose system was adopted in 1970 by the International Chess Federation (FIDE).  The paradigm shift proved irresistible to mathematicians.  About the same time that Elo was developing his system, similar ideas were afloat in Australia [E2, Part 1]. 

The new formal theory of ratings was based on an implicit measurement of playing strength, necessarily implicit since there is no clear measurement of playing strength beyond the obvious facts of winning, losing, and drawing.  The measurement would become explicit with the development of ratings, just as the notion of gravity had become explicit with Newton's formulas.  As a professor of physics, Elo would no doubt have found this analogy appealing.  It will be the burden of this treatise to show that ratings are measurements of playing strength in a figurative sense only.  The simple fact is that ratings are statistics.  The information they convey is based solely on the data provided by pairings and outcomes.  To imagine that they represent some other dimension of playing strength, if only hypothetically, is to invite premature speculations about probability distributions, leading by a circular route to arguments for probability treatments based on the same distributions. 

On the strength of probability theory Elo judged the Ingo and similar systems to be deficient because they were unwittingly based on a rectangular (uniform) distribution as a consequence of their linear formulas.  The implication is that every rating system is based on a probability distribution and that the accuracy of a system is to be judged by the suitability of this distribution.  Elo offered two complete systems, one based on the normal curve, another on the logistic.  Apologists are quick to point out that there is little practical difference between the two systems, although the proposal of alternatives seems problematic by Elo's own standard.  On the view that ratings are statistics we can hardly call any rating system invalid.  We shall have occasion to call the Elo System cumbersome and not entirely coherent, but a judgment of invalidity would admit the mistaken standard it adopts.

By analogy with the commonly accepted scales of measurement, Elo distinguished three types of rating systems: ordinal, interval, and ratio.  Since game scores in aggregate lend themselves to these scales of measurement, the classification is convenient for describing the various statistical methods that arise from rating theory and will be utilized in the following pages.  But first the issue of probability will be revisited.