|
10. Consistency
|
||||||||||||||||||||||||||||||||||||
|
|
Ratings are typically calculated event by event in sequential fashion, but there are distinct advantages to calculating them simultaneously over a defined period of time for a defined competing field. The simultaneous approach, while considerably more onerous, generates ratings that are mathematically consistent. The implicit assumption of sequential ratings is that the active rating pool will eventually reach a similar state of consistency, but this is hardly more than wishful thinking. Elo's terminology with regard to this distinction can be confusing. He essentially divided sequential ratings into two types: continuous ratings, which are calculated event by event, and periodic ratings, which are calculated for calendar periods [E1, 1.5-6]. For the first International Rating List (see Section 6) he employed simultaneous ratings in the form of iterative recursive calculations on a computer. Calculations for this list were "continued until successive values of the differences showed little or no significant change," eight iterations in all [E2]. This produced a more or less self-consistent set of ratings. Consistency for a small data set can be studied with matrix manipulation. Let us test the systems under discussion using the following hypothetical tournament: Table 1. Single Round Robin for Four Players
As seen from its matrix representation, the system of linear formulas for this tournament has an infinite number of solutions, with the rating of player D as a free variable. A unique solution is reached by assigning D a rating and calculating the other ratings accordingly. For example, with D rated .5, the solution is (.75, .75, 0, .5). Systems of ratio formulas, including the Berkin formula [5.6], are homogeneous, allowing only the zero solution (0, 0, 0, 0). For basic ratio ratings [5.1] there is a strategy for finding nonzero solutions where there is an undefeated player, whose rating is ordinarily undefined because of division by zero. Consider, for example, the addition of Player E to the above round robin with a single win against Player A (Table 1a). The corresponding ratio formulas, with player E assigned an arbitrary rating of 1, are given in Matrix 2. A solution is provided by mathematical software as (.6571, .9143, .1429, .5714). Unfortunately, there does not seem to be a corresponding strategy for Berkin ratings. Finally, there is the matrix representation of the system of Elo formulas, calculated by [5.4]. Mathematical software indicates no solution, and manipulation does not proceed beyond the echelon form of Matrix 3a. These results were confirmed by recursive calculations applied to the same hypothetical tournament (Table 2). Calculations were allowed to continue until float values either blew up or showed no change from one iteration to the next. Convergence for the linear formula was rapid, stabilizing in fewer than 20 iterations. For the Berkin formula it was considerably slower, requiring about 150 iterations. Calculations for the Elo formula were discontinued after 200 iterations with little or no convergence and with the system rapidly losing rating points. The ratio formula behaved in similar fashion. The issue of consistency has been illustrated by toy examples. It remains to be seen whether simultaneous methods can practically be applied to scale. We have seen Elo's early computer application applied to 210 players over a three year period. A similar application was tried recently by this author using a Microsoft Excel spreadsheet to see if available commercial software could handle the task. The data set consisted of the 265 USCF-rated games played by the 42 members of the Cranston-Warwick Chess Club (RI) in the calendar year 2007. The spreadsheet, inoperative as presented here, exploits Excel's handling of "circular references," normally considered errors. The result was a set of precisely consistent linear ratings. Column seven in the spreadsheet lists what appear to be Elo ratings. These are actually the result of adapting the original linear ratings to a more familiar scale. With the help of an arbitrary constant K in the basic linear formula, the properties of linear ratings are preserved over linear transformations. A useful transformation involves the minimum and maximum values of the original set of values, as [10.1] R' = (R - min) / (max - min) , which yields a set with a maximum of 1 and a minimum of 0. This scale can be further transformed by looking up the Elo ratings of the players rated 1 and 0. Call these Elo ratings high and low: [10.2] R'' = low + R'(high - low) . Now if R' = 1, then R'' is the high Elo rating; and if R' = 0, then R'' is the low Elo rating. Since max, min, high and low are all constants with respect to the original set of ratings, R'' is a linear transformation of R, yielding a set of pseudo-Elo ratings which retain properties of linear ratings. Since the high Elo rating in the club was close to 2000, and the low Elo rating was close to 1000, it was decided to use the interval 1000 to 2000 for the unofficial club ratings. There remain only a few words of caution for anyone dealing with an operational version of the spreadsheet: Absolute values of the original "raw" ratings are not stable. Since Excel is constantly recalculating values based on other calculations, the ratings may change with almost any other change in the spreadsheet. This does not affect relative values and, hence, does not affect the transformed values as described above. It is a good idea, however, to use the "manual" option for calculations and to turn off the "calculate on save" option. Finally, bear in mind that there is no guarantee of convergence for rating values, especially for small data sets. |