2. Probability in Rating Systems

 

      

     

     

     

   

The central concept of the Elo system is the Percentage Expectancy Curve, relating percentage score to rating difference.  This function is patterned on well-known probability functions, either the normal or the logistic.  Percentage score, considered apart from rating difference, may be treated in various ways by probability theory.  As Elo points out [E1, 8.9], a percentage score may be regarded as a sample from a known trinomial distribution of wins, losses, and draws.  The probability of a certain outcome in N games can be calculated precisely from the number of wins W, losses L, and draws D as

[2.1]     P(W, L, D)  =  [ N! / (W! L! D!) ] . P(win)W . P(loss)L . P(draw)D .

The percentages on the right side of this equation represent the proportion of wins, losses, and draws in the parent population.  More typically, the distribution from which samples are taken is unknown, and we wish to estimate its mean from a sampling distribution of means, that is, from the means of an indefinitely large number of samples.  This distribution was famously shown to approach normal as sample size increases, regardless of the shape of the parent population, and Elo seized on this Central Limit Theorem as justification for the normal version of his Percentage Expectancy Curve [E1, 8.22].  He ignored the fact that the variability of the distribution, called standard error of the estimate, becomes increasingly small and chose instead an arbitrary constant of 200 rating points to represent this value.  In the final analysis, this imprecision is of no consequence since it is the entire treatment of percentage score that is in question.  Interesting as the probability implications may be, they have no bearing whatever on the relation of percentage score to rating difference.

For Elo the Percentage Expectancy Curve was patently a probability function, and he could make no sense of the objection that it was not.  In hindsight, the objection might better have been raised as a distinction of terms.  Since a percentage score may be thought of as an estimate of probability, defined as a long-term percentage, there is reason enough to regard the Percentage Expectancy Curve as a function that relates probability to rating difference.  In this broad sense, it is a probability function..  But there is another sense of the term that is restricted to those functions that arise in probability theory from a mathematical analysis of variability, such as the normal curve or the logistic, and these we may call true probability functions.  A function that merely maps probability to another variable without some justification based on variability analysis would thus be called an arbitrary probability function. Such functions include those based arbitrarily on a true probability function, which is the case of the Percentage Expectancy Curve.  If the Percentage Expectancy Curve itself were a true probability function, it would be derived independently from distributions of rating difference, however these might arise, but these can hardly be known without a pre-established definition of ratings. 

It need hardly be said that an arbitrary probability function may take virtually any form, including the linear form deprecated by Elo.  Since the function is by definition arbitrary, it cannot be improved by mimicking true probability functions. Choosing the best statistic for a rating system will depend on criteria other than variability analysis, such as simplicity and practicality.  Unfortunately, the complexities of probability theory have captured the imagination of chess, and the bell curve has become an icon of chess ratings.  What follows is a belated attempt to demonstrate a more reasonable theoretical basis.