9. Percentage Expectancy

 

    


Attempts to apply probability theory to the rating process must deal with two equally plausible definitions of probability.  The first definition invokes a sample space divided into n subsets of equally likely outcomes.  If an event is associated with r of these outcomes, then its probability is r / n.  This definition is at the heart of probability distributions, which do not fare well in the realm of chess ratings.  The Percentage Expectancy Curve, patterned after the normal or Gaussian distribution or the similar logistic distribution, is not really a probability distribution at all.  Instead, it arbitrarily assigns probability values to rating differences in imitation of these important distributions.  The point is made clear by consideration of an actual distribution of rating pairs in a competing field, ordered by their algebraic difference.  Each pairing is included twice:  first,  for the difference from the first player's standpoint; and second, for the difference from the second player's standpoint, making a symmetrical distribution.  Can percentage expectancies be deduced from the parameters of the distribution?  We could at best come up with probabilities for rating differences falling above or below specific values, which would tell us nothing about the relative performance associated with specific differences.

The second definition of probability is more promising in this context.  This definition invokes the long-term limit of the relative frequency of an event.  If an event occurs r out of n times as n goes to infinity, then its probability is r / n.  The percentage scores encountered  in rating systems may be regarded as estimates of this probability.  Rating systems are conservative in assuming, even in the face of evident changes in playing strength, that percentage scores tend toward a long-term limit.  For a given pair of ratings, percentage expectancy is the hypothetical result that produces no change in the ratings. It is calculated in any rating system as the inverse of its basic formula, solving for percentage score.  

Probabilities in a rating system are relative to pairs of contestants in the competing field.. Rating systems generally assign ratings on the basis of relative performance as an estimate of these probabilities, either as differences in percentage score or ratios of percentage score.  We may speculate that unease with the derivation of interval ratings from the normal curve led Elo to his logistic system, which takes a completely different approach.  He began with the premise that the odds of player x to score over player z are

          (Pxy / Pyx) (Pyz / Pzy)  =  Pxz / Pzx

where Pxy is the probability of x scoring over y, etc. [E1, 8.33].  This is based on the clearly false notion that probabilities are transitive.  It is a commonplace observation in chess and other games that results are not transitive.  If x defeats y, and y defeats z, it does not follow that x defeats z, even though the latter result may be in some sense expected.  A similar observation holds for probabilities.  Martin Gardner in one of his mathematical sketches [G, "Nontransitive Dice and Other Paradoxes"] reports on a set of four dice cleverly constructed to demonstrate this.  In a game using these dice, a player selects a die with the idea of maximizing his chances.  The second player is then able to select one of the remaining dice such that his odds of winning any roll-off against the first player are 2:1 in his favor.  This is because the probabilities involved are not transitive.  There is no "best" die among the four.

It could be argued that Elo was postulating the odds that would hold if probabilities were transitive.  The point, in any case, is largely moot in view of the fact that his logistic system can be derived from the basic ratio formula [5.1].  The inverse of this formula is percentage expectancy,

[9.1]        Pe  =  R / (R + ERc) .

In a logarithmic system

[9.2]       Pe  =   bR / (bR + bRc)

for its base b.  In Elo's logistic system, as we saw in Ratio Systems,

 [9.3]      b  =  101/400 .

Consequently, 

[9.4]       Pe  =   10R/400 / (10R/400 + 10Rc/400) . 

Dividing top and bottom by  10R/400,

[9.5]       Pe  =   1 / (1 + 10(Rc-R)/400) . 

Substituting the variables C = 200 and D = R - Rc,

[9.6]       Pe  =   1 / (1 + 10-D/2C) ,

which is the logistic formula for Elo's Percentage Expectancy Curve [E1, 8.43].

It was previously remarked that the advantages of a ratio rating system over an interval system are not pronounced in a statistical setting that does not involve physical measurements.  A ratio system nevertheless has its advantages, most notably in the fact that its rating scale is unlimited.  It is sometimes objected that the zero point on an interval scale is unrealistic because an upset in any pairing is possible.  In theory, if an upset is impossible, then the probability of the weaker player winning is zero, but the converse is not true.  By the frequency definition of probability, a probability of zero means a relative frequency that tends to zero as a limit, which does not exclude the possibility of an upset. 

Elo's speculation that prolonged use of an interval system "draws the players in the pool together, eventually into a 4C range, filling out [a rectangular pattern]" need not be taken seriously.  There is a tendency, as Elo himself noted, for averaging to counteract the effect by the Central Limit Theorem [E1, 8.57].  His speculation does suggest the interesting possibility of replacing ratings with quantiles in the basic linear formulas.  Quantiles, by their very nature, are uniformly distributed in a random population, which would allow undistorted averaging.  Manipulation of quantiles by linear formulas would act directly on the parameters of their distribution, although a continuous updating of their values would then be necessary.