Back to Basics in Rating Theory
1. Beyond the Elo System
Probability in Rating Systems
The first attempt at a mathematical rating system for chess has been credited to the Correspondence Chess League of America in 1939. With a world war intervening, the influential Ingo System of West Germany followed in 1948, named by its originator Anton Hoesslinger (1875-1959) for his home town, Ingolstadt in Bavaria [HW, "rating"]. It establishes the basics of ratings with a simple formula,
[1.1] R = ERc - (Pct - 50) ,
where ERc is the arithmetic average of the opposition ratings and
Pct is the player's score in percentage points. (A
peculiarity here, from the standpoint of subsequent systems, is that
lower ratings represent greater playing strength.) The Ingo formula
represents a notable advance in the rating of chess players, though it
appears to have sprung primarily from Hoesslinger's intuition. The formal development of rating theory
began about 1960 with the introduction of probability formulas.
The originator of this idea was Arpad E. Elo (1903-1992), one of the founders of the United States Chess Federation (USCF), whose system was adopted
in 1970 by the International Chess Federation (FIDE). The paradigm
proved irresistible to mathematicians.
About the same time that Elo was developing his system, similar ideas
were afloat in Australia [E2, Part 1].
The new formal theory of ratings was based on an implicit measurement of
playing strength, necessarily implicit since there
is no clear measurement of playing strength beyond the obvious facts
of winning, losing, and drawing. The measurement would
become explicit with the development of ratings, just as the notion of
gravity had become explicit with Newton's formulas. As
a professor of physics, Elo would no doubt have found this analogy
appealing. It will be the burden of this treatise to show that
ratings are measurements of playing strength in a figurative sense only. The simple fact is that ratings are statistics.
The information they convey is based solely on the data provided by pairings and outcomes.
To imagine that
they represent some other dimension of playing strength, if only hypothetically, is to invite premature speculations
about probability distributions, leading by a circular
route to arguments for probability treatments based on the same distributions.
By analogy with the commonly accepted scales of measurement, Elo distinguished three types of rating systems: ordinal, interval, and ratio. Since game scores in aggregate lend themselves to these scales of measurement, the classification is convenient for describing the various statistical methods that arise from rating theory and will be utilized in the following pages. But first the issue of probability will be revisited.