|
|
|
The basic assumption of the Elo System is that the
chess performance of an individual player is a random variable that can be
described by the normal curve [E1,
1.31]. But what exactly
is varying in this random variable?
s applied to a single game, performance
is an abstraction which cannot be measured objectively.
It consists of all the judgments, decisions, and action of the
contestant in the course of the game.
Perhaps a panel of experts in the art of the game could evaluate
each move on some arbitrary scale and crudely express the total
performance numerically, even as is done in boxing and gymnastics [E1,
1.32].
Performance in this light does not seem a very promising
candidate for mathematical treatment.
Fortunately, there is a simple definition of performance that is as
old as the game. If a player
outperforms the opponent, he/she wins and scores the point; if he/she
loses, the opponent scores the point; and if a draw occurs, the point is
divided. This simple
definition need not lead to a simple-minded treatment.
Under the heading "Sundry Theoretical Topics" [E1,
8.9], Elo pointed
out that the probability of a specific outcome in terms of wins W, losses
L, and draws D can be calculated precisely as
[2.1] P(W, L, D) = [ N! / (W! L! D!) ] .
P(win)W
. P(loss)L . P(draw)D
if we know the probabilities P(win), P(loss), and P(draw). Let us consider the
outcomes for ten games (N = 10), where P(win) = .5, P(loss) = .2, and
P(draw) = .3, and let us express the results in terms of points scored.
Since [2.1] must be calculated for 66 three-way partitions of 10,
this is best done with a computer program (Downloads).
The results are as follows:

Although Elo presented [2.1] almost as an
afterthought, it provides a convincing demonstration of the "first and
basic assumption" of his system [E2,
"Form Varies"].
Unfortunately, it leads no further.
Variation of performance in this sense yields no useful
definition of probability since the concern of rating theory is to
relate percentage scores to ratings.
The outcome here merely confirms the percentage score .65 as the
mean.
Elo's rather nebulous definition of
performance leads to the central concept of his system: the Percentage
Expectancy Curve, which is patterned on a well-known probability function,
either the normal or the logistic.
The Percentage Expectancy Curve relates percentage score
(equivalent in the long run to probability) to rating difference.
How it does this is a crucial question for the system.
We are shown two overlapping distributions [E1,
8.23] and are told that the shaded portion of one "represents the
probability that the lower rated player will outperform the higher."
Apparently, if a player's performance is greater than that of the
opponent, he/she wins; if the performance is lower, he/she loses.
This argument, aside from the objection that it leaves draws
completely out of account, is a ponderous burden for such a tenuous
concept to bear. Outperforming
the opponent seems equivalent in every respect to winning; yet this would
suggest a binomial distribution, if not trinomial.
For Elo the Percentage Expectancy Curve
was patently a probability function, and he could make no sense of the
objection that it was not. In hindsight, the objection might better
have been raised as a distinction of terms. Since a percentage score
may be thought of as an estimate of probability, defined as a long-term
percentage, there is reason enough to regard the Percentage Expectancy
Curve as a function that relates probability to rating difference.
In this broad sense, it is a probability function.. But there is
another sense of the term that is restricted to those functions that arise
in probability theory from a mathematical analysis of variability, such as
the normal curve or the logistic, and these we may call true
probability functions. A function that merely maps probability to
another variable without some justification based on variability analysis
would thus be called an arbitrary probability function. Such
functions include those based arbitrarily on a true probability function,
which is the case of the Percentage Expectancy Curve. If the
Percentage Expectancy Curve itself were a true probability function, it
would be derived independently from distributions of rating difference,
however these might arise, but these can hardly be known without a
pre-established definition of ratings.
It need hardly be
said that an arbitrary probability function may take virtually any form,
including the linear form deprecated by Elo. Since the function is
by definition arbitrary, it cannot be improved by mimicking true
probability functions. Choosing the best statistic for a rating system
will depend on criteria other than variability analysis, including but not
limited to simplicity and practicality. Unfortunately, the
complexities of probability theory have captured the imagination of chess,
and the bell curve has become an icon of chess ratings. What follows
is a belated attempt to demonstrate a more reasonable theoretical basis.
|