![]() |
10. Percentage Expectancy
|
|
|
|
Attempts to apply probability theory to the rating process must deal with two equally plausible definitions of probability. The first definition invokes a sample space divided into n subsets of equally likely outcomes. If an event is associated with r of these outcomes, then its probability is r / n. This definition is at the heart of probability distributions, such as the normal or logistic. The Percentage Expectancy Curve, which is patterned after these distributions, is not really a probability distribution in the usual sense. Instead, it arbitrarily assigns probability values to rating differences in imitation of these important distributions. In an actual distribution of rating differences, the differences above or below a certain percentile tell us nothing about percentage expectancies. The second definition of probability invokes the long-term limit of the relative frequency of an event. If an event occurs r out of n times as n goes to infinity, then its probability is r / n. The percentage scores encountered in rating systems represent estimates of this probability relative to rating differences or ratios. Rating systems are conservative in assuming, even in the face of evident changes in playing strength, that percentage scores tend toward a long-term limit. For a given pair of ratings, percentage expectancy is the hypothetical result that produces no change in the ratings. It is calculated in any rating system as the inverse of its basic formula, solving for percentage score. The development of Elo's ratio system is very different from that of his interval system, but the emphasis in each case is on a probability distribution. We have seen that there are problems with deriving percentage expectancies from the normal curve. It is possible that unease with the derivation led Elo to his ratio system, both to set out in a new direction and to confirm his earlier work. In any case, he went on to develop a ratio version of his rating system which was eventually implemented by the USCF and FIDE. The development of the system begins with a disarmingly simple premise. Given the odds of Player x defeating Player y and the odds of Player y defeating Player z, then the odds of Player x defeating Player z are [10.1] (Pxy / Pyx) (Pyz / Pzy) = Pxz / Pzx where Pxy is the probability of x defeating y, etc. [E1, 8.33]. This neat little result should immediately raise red flags. Although the result is given in the form of odds, it implies that if we know the probabilities of x defeating y and of y defeating z, then the probability of x defeating z follows as a logical consequence. We know from experience that because x defeats y and y defeats z, it is by no means certain that x will defeat z. Does expressing this in terms of odds make the observation any more certain? Formula [10.1] posits a transitive relationship among the probabilities. An amusing counterexample was provided by Martin Gardner (1914-2010) in one of his mathematical sketches [G, "Nontransitive Dice and Other Paradoxes"]. He reported on a set of four dice cleverly constructed to demonstrate nontransitive probabilities. In a game that makes use of these dice, the first player selects a die with the idea of maximizing his chances in a roll-off against the second player. The second player is always able to select one of the remaining dice such that his odds of winning the roll-off are 2:1 in his favor. It would seem that there must be one die among the four, call it x, such that for any of the three remaining dice, call it y, the odds of the first player winning with x against y are at least even, but this is not the case. This is because the probabilities involved are not transitive. Before accepting [10.1] at face value, we are inclined to doubt the propriety of multiplying odds in this fashion. Formula [10.1] does not figure directly in the development of the logistic function, the basis of Elo's ratio system. It does affect our interpretation of the logistic function as a probability distribution based on what Elo called a logarithmic interval scale. The development of the logistic function begins with his formula (39), which happens to be equivalent to our formula [5.1]. It is not surprising, then, that we can derive the logistic function from the latter. Solving [5.1] for P gives a formula for percentage expectancy as [10.2] Pe = R / (R + ERc) . Dropping the mean indication, in a logarithmic system [10.3] Pe = bR / (bR + bRc) for its base b. In Elo's logistic system, as we saw in Ratio Systems, [10.4] b = 101/400 . Consequently, [10.5] Pe = 10R/400 / (10R/400 + 10Rc/400) . Dividing top and bottom by 10R/400, [10.6] Pe = 1 / (1 + 10(Rc-R)/400) . Substituting the variables C = 200 and D = R - Rc, [10.7] Pe = 1 / (1 + 10-D/2C) , which is the logistic formula for Elo's Percentage Expectancy Curve [E1, 8.43]. The significance of this derivation is that it does not postulate the logistic as a probability function. A ratio system has the advantage of an unlimited rating scale, although the extremes of this scale are of limited interest. It is sometimes objected that the zero point on an interval scale is unrealistic because an upset in any pairing is always possible. In theory, if an upset is impossible, then the probability of the weaker player winning is zero, but the converse is not true. By the frequency definition of probability, a probability of zero means a relative frequency that tends to zero as a limit, which does not exclude the possibility of an upset. Elo's speculation that prolonged use of an interval system "draws the players in the pool together, eventually into a 4C range, filling out [a rectangular pattern]" is based on his view of rating formulas as probability distributions. There is a tendency, as Elo himself noted, for averaging to counteract the effect by the Central Limit Theorem [E1, 8.57]. |
|