10. Consistency

 

          

   

Ratings are typically calculated event by event in sequential fashion, but there are distinct advantages to calculating them simultaneously over a defined period of time for a defined competing field.  The simultaneous approach, while considerably more onerous, generates ratings that are mathematically consistent.  The implicit assumption of sequential ratings is that the active rating pool will eventually reach a similar state of consistency, but this is hardly more than wishful thinking.  Elo's terminology with regard to this distinction can be confusing.  He essentially divided sequential ratings into two types: continuous ratings, which are calculated event by event, and periodic ratings, which are calculated for calendar periods [E1, 1.5-6].  For the first International Rating List (see Sequential Ratings) he employed simultaneous ratings in the form of iterative calculations on a computer.  Calculations for this list were "continued until successive values of the differences showed little or no significant change," eight iterations in all [E2, Part 3].  This produced a more or less self-consistent set of ratings.

Consistency for a small data set can be studied with matrix manipulation.  Let us test the systems under discussion using the following hypothetical tournament:

Table 1.

Single Round Robin for Four Players

 

Players

A

B

C

D

wins

pct.

A

x 1 1 0 2 .667
B 0 x 1 1 2 .667
C 0 0 x 1/2 .5 .167
D 1 0 1/2 x 1.5 .500

 

As seen from its matrix representation, the system of linear formulas for this tournament has an infinite number of solutions, with the rating of player D as a free variable.  A unique solution is reached by assigning D a rating and calculating the other ratings accordingly.  For example, with D rated .5, the solution is (.75,  .75,  0, .5).  Systems of ratio formulas, including the Berkin formula [14.2], are homogeneous, allowing only the zero solution (0, 0, 0, 0).  For basic ratio ratings [5.1] there is a strategy for finding nonzero solutions where there is an undefeated player, whose rating is ordinarily undefined because of division by zero.  Consider, for example, the addition of Player E to the above round robin with a single win against Player A (Table 4).  A matrix for the corresponding ratio formulas, with player E assigned an arbitrary rating of 1, is given in Table 5.  A solution is provided by mathematical software as (.6571, .9143, .1429, .5714).  There does not seem to be a corresponding strategy for Berkin ratings, but the overdetermined matrix in Table 6 gives the solution (.3684, .3158, 0526, .2632) as well as the notable advantage that the solution set can be assigned an arbitrary mean (here .25).  Finally, there is the matrix representation of the system of Elo formulas, calculated by [5.5].  Mathematical software indicates no solution, and manipulation does not  proceed beyond the echelon form of Table 8.

These results are confirmed by iterative calculations on the round robin results of Table 1 starting with a single arbitrary rating.  As seen in Table 9, ratings converge rapidly to their final values by the linear formula.  The Berkin formula also shows convergence, though at a slower rate.  Neither the Elo formula nor the general ratio formula yields convergent ratings.

Large-scale applications of simultaneous ratings require iterative calculations, such as were used by Elo in compiling the initial International Rating List.  A similar application was tried recently by this author using Microsoft Excel spreadsheets to see whether available commercial software could handle the task (Downloads).   The spreadsheets exploit Excel's handling of "circular references," normally considered errors, to produce self-consistent ratings.  The mode of calculation has been set to manual, and the "calculate on save" option has been turned off.  Initially only the results of the first cycle of calculations are shown.  The buttons under Calculation on the Formulas tab (or F9) produce subsequent cycles.  

The spreadsheet for simultaneous linear ratings (CrossTable2007) uses 265 USCF-rated games played by the 42 members of the Cranston-Warwick Chess Club (RI) in the calendar year 2007.  Again, the function key F9 causes the ratings to converge in stepwise fashion.  To the far right of the spreadsheet, ratings are rendered in more familiar formats.  Format 1 is based on a fractional scale from 0 to 1.  The original ratings R are converted using minimum and maximum values, as

[10.1]        Rf  =  (R - Rmin) / (Rmax - Rmin) .

The resulting fractional rating is converted in turn to a kind of Elo rating based on a linear scale (Format 2):

[10.2]         RE  =  Elomin + Rf (Elomax - Elomin) .

Since the high Elo rating in the club was close to 2000, and the low Elo rating close to 1000, it was decided to use the interval 1000 to 2000 for the unofficial 2007 Elo club ratings. 

Another spreadsheet (Champs2008) is for simultaneous Berkin ratings, using data from the 2008 club championship of the Cranston-Warwick Chess Club.  Convergence by the function key F9 in this case is quite rapid.  On the far right of the spreadsheet, the ratings are rendered as natural logarithms and as pseudo-Elo ratings.  The Berkin ratings were initialized so as to produce plausible Elo ratings as a function of 400 times the common logarithm.