Back to Basics in Rating Theory

1. Beyond the Elo System

Basics
Probability in Rating Systems
Ordinal Ratings
Interval Ratings
Ratio Ratings

Issues
Sequential vs. Simultaneous
Established Ratings

Attenuation
Percentage Expectancy

Applications
Consistency
Methods of Calculation
A Test
Skeptical Conclusions


Experiments
The Berkin System
Progressive Ratings
Rating by Single Games

Author
Downloads
PDF



Rating systems in their modern mathematical form first appeared in 1939 in a system used by the Correspondence Chess League of America.  The influential Ingo System of West Germany followed in 1948, named by its originator Anton Hoesslinger (1875-1959) for his home town, Ingolstadt in Bavaria [HW, "rating"].  It establishes the basics of ratings in a remarkably simple formula,

[1.1]                    R  =  ERc - (Pct - 50) ,

where ERc is the arithmetic average of the opposition ratings and Pct is the player's score in percentage points.  A peculiarity here, from the standpoint of subsequent systems, is that lower ratings represent greater playing strength. Hoesslinger appears to have relied largely on intuition in developing his system, which manages nevertheless to be theoretically provocative.  The actual development of rating theory took a different tack about 1960 with the introduction of probability formulas.  The main proponent of this idea was Arpad E. Elo, one of the founders of the United States Chess Federation (USCF), whose system was subsequently adopted by the International Chess Federation (FIDE).  The paradigm shift that brought the application of probability theory to the rating of chessplayers proved irresistible to mathematicians.  About the same time that Elo was developing his system, similar ideas were afloat in Australia [E2, Part 1]. 

Mathematicians should keep in mind, however, the essential nature of chess ratings.  One is tempted to think of ratings as measurements of performance, in the same sense as measurements of physical phenomena.  As a trained physicist Elo was especially prone to this interpretation.  The simple fact is that ratings are statistics.  The information they convey is based solely on the data provided by pairings and outcomes.  To imagine that they represent some other dimension of playing strength, if only hypothetically, is to invite premature speculations about probability distributions.  Such speculations lead by a circular route to arguments for probability treatments based on the same distributions.

On the strength of probability theory Elo judged the Ingo and similar systems to be deficient because they were unwittingly based on a rectangular (uniform) distribution as a consequence of their linear formulas.  The implication is that every rating system is based on a probability distribution and that the accuracy of a system is to be judged by the suitability of this distribution.  Elo offered two complete systems, one based on the normal curve, another on the logistic.  Apologists are quick to point out that there is little practical difference between the two systems, though the existence of alternatives seems problematic by Elo's own standard.  By analogy with scales of measurement, Elo distinguished three types of rating systems: ordinal, interval, and ratio.  This classification is convenient enough for describing the different statistical methods that arise from rating theory and will be utilized in the following pages.  But first the issue of probability will revisited.