16. Rating by Single Games

 

     

   

The nonlinear character of ratio systems poses problems for averaging.  It is not hard to come up with examples of inconsistency in the application of ratio formulas.  We begin with an expression for percentage expectancy of a player rated R against an average opposition rating of ERc, which follows algebraically from [5.1] as

[16.1]        P(ERc)  =  R / (R + ERc)  .

The formula also applies to the individual opposition ratings, which may be written

[16.2]        P(Rc)  =  R / (R + Rc) .

Is it true for the N opponents of [16.1] that

[16.3]        N . P(ERc)  =  S [P(Rc)]  ?

Generally, no.  Perhaps this is the fault of the method of averaging we have chosen for [16.1], but the inaccuracy arises for any of the common means:  arithmetic, geometric, or harmonic.  It would seem preferable to avoid averaging altogether in the application of ratio ratings, and this is a strategy that has been followed by practitioners of the Elo System.  A simple improvement is to calculate P(D) by [5.3] for each of the opposition ratings, taking the sum of the results as We in the standard formula

 [16.4]        DR   =   K(W - We) .

A more general solution for ratio ratings of single-game events begins with the description of cumulative averaging given in Methods of Calculation.  Adapting [11.4] to a single result

[16.5]         DPe  =  (S – Pe) / (N  + 1) ,

which describes the incorporation of a score S into the running average.  The subscript on N has been dropped to avoid confusion with Elo's constant.  The original percentage expectancy refers in this case to the pregame average, but it might with equal plausibility refer to the percentage expectancy against a new opponent.  

Another expression for change in percentage expectancy begins with the percentage expectancy of the original rating against a new opponent.   

[16.6]             Pe   =   R  /  (R + Rc) ,

which yields the equivalent expression

                             =   (R + DRPe) / (R + Rc + DR) .

The new percentage expectancy resulting from a change in R would be

[16.7]             Pne   =    (R + DR) / (R + Rc + DR) .

Subtracting [16.6] from [16.7] gives the new expression

[16.8]            DPe   =   [(1 - Pe)DR] / (R + Rc + DR) .

We now have two formulas for change in percentage expectancy, [16.5] and [16.8].  Setting the two right sides equal,

[16.9]          (S - Pe) / (N + 1)  =  [(1 - Pe)DR] / (R + Rc + DR)  .

We can now solve for change R as 

[16.10]       DR  =  [S(R + Rc) – R] . [R + Rc] / [(1 – S)(R + Rc) + NRc] .

If  N in [16.5] is maintained as a constant instead of being incremented recursively, the solution becomes

[16.11]       DR  =  [S(R + Rc) – R] . [R + Rc] / [(1 – S)(R + Rc) + (No - 1)Rc] .

This is the ratio analogue to Elo's established formula applied to single games. We can get some idea of the practical value of this formula by a simulation (Downloads).  The simulation there uses random pairings among 800 players.  The predefined outcome for each pairing, S, is compared with the expected score, Pe, for the ratings calculated by each of the simulated systems.  Games are played in sequences of 1600.  After each sequence the root mean square of S - Pe is computed over the 1600 results, as shown in Table 10.  The predefined outcomes are invariably a win for the higher-ranked player.  The curious user may substitute probability functions by uncommenting the appropriate lines in the playing field constructor.  The constants for number of players, number of games played, etc., may also be changed.

One must be cautious about the conclusions drawn from such a simulation, as was forcefully argued in A Test.  It is safe to say that the simulated systems behave in a similar fashion.  Indeed, graphed results are often indistinguishable when probability functions are used.  The significance of comparisons between the simulated systems is less certain.  For one thing, minor changes in the underlying data can alter outcomes dramatically.  More fundamental questions are raised by the practice of comparing statistical results.