![]() |
6. Sequential vs. Simultaneous Ratings
|
|
|
|
Rating systems in chess typically maintain a pool of ratings and apply their formulas to contests, either individual games or tournaments, as they occur among the rated players. This straightforward approach, producing sequential ratings, is not without its problems.
The minimizing effect of the rating process will eventually make itself felt in a sequential system, but the effect on individual ratings in the meantime may be unfortunate. The Elo System attempts to keep ratings up-to-date by limiting sample size in its established rating, which becomes a kind of moving average, described more fully under Established Ratings. Unlike ordinary moving averages, where sample size is restricted to the last N games, established ratings are based on attenuated sample weight. The effect of a rated game, having an original sample weight of 1/N, becomes attenuated as more and more data are processed. In theory at least, the effect is never completely lost. The established rating becomes what might be called a weighted moving average. Although recent data are more heavily weighted, rating changes emerging from the averaging process generally do not keep pace with changes in playing strength. Timely adjustments, in short, do not guarantee currency of the data on which they are based. A more rigorous application of the rating process involves simultaneous calculations for a defined data set. In 1969 such an application produced the first International Rating List [E2, Part 3]. Recursive calculations on a computer were applied to the complete interplay of 210 contestants over the previous three years. This was regarded primarily as a method for initializing the rating pool, but the effect was to produce a self-consistent set of ratings with clearly defined boundaries. Linear programming, as exemplified by this method, has the drawback of being computationally intensive, and it is an open question whether the masses of data processed by a large rating system could be handled in this manner. As a first step toward simultaneous calculations, it seems reasonable to deal with rating adjustments between pairs of contestants. Sequential ratings typically consist of rating adjustments based on expected performance against pre-event opposition ratings. These adjustments are mirrored in the opposition ratings. Intuitively, we should be able to improve the accuracy of ratings by halving each rating adjustment. The simulation summarized in the graph below (Downloads) suggests that this idea works in the initial stages of interplay, but that the advantage is eventually dissipated. The simulation consists of random interplay among a field of 100 players, which is repeated over sequences of various numbers of games: 100, 200, ..., 1000. The higher ranked player invariably wins, and results are rated sequentially by linear ratings. At the conclusion of each repetition an error statistic is calculated for all pairings in the field, namely, the root mean square of the deviation of actual relative performance, W - L, from expected relative performance based on the generated ratings.
|