16. Rating by Single Games

 

     

   

The nonlinear character of ratio systems poses problems for averaging.  It is not hard to come up with examples of inconsistency in the application of ratio formulas.  We begin with an expression for percentage expectancy of a player rated R against an average opposition rating of ERc, which follows algebraically from [5.1] as

[16.1]        P(ERc)  =  R / (R + ERc)  .

The formula also applies to the individual opposition ratings, which may be written

[16.2]        P(Rc)  =  R / (R + Rc) .

Is it true for the N opponents of [16.1] that

[16.3]        N . P(ERc)  =  S [P(Rc)]  ?

Generally, no.  Perhaps this is the fault of the method of averaging we have chosen for [16.1], but the inaccuracy arises for any of the common means:  arithmetic, geometric, or harmonic.  It would seem preferable to avoid averaging altogether in the application of ratio ratings, and this is a strategy that has been followed by practitioners of the Elo System.  A simple improvement is to calculate P(D) by [5.3] for each of the opposition ratings, taking the sum of the results as We in the standard formula

 [16.4]        DR   =   K(W - We) .

 A more general solution for ratio ratings of single-game events begins with the description of cumulative averaging given in Methods of Calculation.  Adapting [11.4] to a single result

[16.5]         DPe  =  (S – Pe) / (No + 1) ,

which describes the incorporation of a score S into the running average.  The original sample size here should not be confused with Elo's constant.  The original percentage expectancy refers in this case to the pregame percentage score, but it might with equal plausibility refer to the percentage expectancy against a new opponent.  

Another expression for change in percentage expectancy begins with the percentage expectancy of the original rating against a new opponent.   

[16.6]             Pe   =   Ro /  (Ro + Rc) ,

which yields the equivalent expression

                             =   (Ro + DRPe) / (Ro + Rc + DR) .

The new percentage expectancy resulting from a change in R would be

[16.7]             Pne   =    (Ro + DR) / (Ro + Rc + DR) .

Subtracting [16.6] from [16.7] gives the new expression

[16.8]            DPe   =   [(1 - Pe)DR] / (Ro + Rc + DR) .

We now have two formulas for change in percentage expectancy, [16.5] and [16.8].  Setting the two right sides equal,

[16.9]          (S - Pe) / (No + 1) =  [(1 - Pe)DR] / (Ro + Rc + DR)  .

We can now solve for change R as 

[16.10]       DR  =  [S(Ro + Rc) – Ro] . [Ro + Rc] / [(1 – S)(Ro + Rc) + NoRc] .

If  No in [16.5] is maintained as a constant instead of being incremented recursively, the sequence of calculations yields

[16.11]       DR  =  [S(Ro + Rc) – Ro] . [Ro + Rc] / [(1 – S)(Ro + Rc) + (No - 1)Rc] .

This is the ratio analogue to Elo's established formula applied to single games.  The sum of changes in a multiple-game event would be applied to Ro as the pre-event rating.  We can get some idea of the practical value of this formula by a simulation (Downloads) using random pairings among 800 players.  The predefined outcome S for each pairing is compared with the expected score Pe for the ratings calculated by each of the simulated systems.  Games are played in sequences of 1600.  After each sequence the root mean square of S - Pe is computed over the 1600 results, as shown in Table 10 and its corresponding chart.  The predefined outcomes in the sequences of Table 10 are always a win for the higher-ranked player, but the user may substitute probability functions by uncommenting the appropriate lines in the playing field constructor.  The constants for number of players, number of games played, etc., may also be changed.

As was forcefully argued in A Test, one must be cautious about the conclusions drawn from such a simulation.  Precise simultaneous calculations, it will be recalled, can be made for the Berkin System and for linear systems.  For these systems at least, the different rates of approximation observed in Table 10 seem to be an artifact of sequential calculations.  Such differences can be overcome by a feedback process, such as that proposed by Elo [E1, 3.75].  Elo intended his feedback for rating what he called "exceptional performers," those which arise from changes in the underlying playing strength of rated players, but the process works equally well in the present context.  The simulation provided as a download has a feedback loop in the subroutine RateBerkin.  The feedback results of Table 10 can be obtained by uncommenting this loop.  Exceptional performances are defined here as producing an absolute difference greater than .4 between expected and actual scores.  As with other constants in the simulation, this may be changed by the user for experimental purposes.