6. Sequential vs. Simultaneous Ratings

 

   

Rating systems in chess typically maintain a pool of ratings and apply their formulas to contests, either individual games or tournaments, as they occur among the rated players. This straightforward approach, producing sequential ratings, is not without its problems. 

  • Although the emphasis is usually on up-to-date ratings, the underlying data are an assortment of old and new.

  • While some ratings are more or less stable, others are rising rapidly, and interaction of the two sorts causes deflation in the pool at large.

  • Data samples vary in size considerably, from the few games of the occasional player to the many games of the enthusiast, with resulting variation in sampling error.

  • Finally, the rating pool itself is changing as players come and go.

The minimizing effect of the rating process will eventually make itself felt in a sequential system, but the effect on individual ratings in the meantime may be unfortunate.  The Elo System attempts to keep ratings up-to-date by limiting sample size in its established rating, which becomes a kind of moving average, described more fully under Established Ratings. Unlike ordinary moving averages, where sample size is restricted to the last N games, established ratings are based on attenuated sample weight. The effect of a rated game, having an original sample weight of 1/N, becomes attenuated as more and more data are processed.  In theory at least, the effect is never completely lost. The established rating becomes what might be called a weighted moving average. Although recent data are more heavily weighted, rating changes emerging from the averaging process generally do not keep pace with changes in playing strength.  Timely adjustments, in short, do not guarantee currency of the data on which they are based.

A more rigorous application of the rating process involves simultaneous calculations for a defined data set.  In 1969 such an application produced the first International Rating List [E2, Part 3].  Recursive calculations on a computer were applied to the complete interplay of 210 contestants over the previous three years.  This was regarded primarily as a method for initializing the rating pool, but the effect was to produce a self-consistent set of ratings with clearly defined boundaries. Linear programming, as exemplified by this method, has the drawback of being computationally intensive, and it is an open question whether the masses of data processed by a large rating system could be handled in this manner.

As a first step toward simultaneous calculations, it seems reasonable to deal with rating adjustments between pairs of contestants.  Sequential ratings typically consist of rating adjustments based on expected performance against pre-event opposition ratings.  These adjustments are mirrored in the opposition ratings.  Intuitively, we should be able to improve the accuracy of ratings by halving each rating adjustment.  The simulation summarized in the graph below (Downloads) suggests that this idea works in the initial stages of interplay, but that the advantage is eventually dissipated.  The simulation consists of random interplay among a field of 100 players, which is repeated over sequences of various numbers of games:  100, 200, ..., 1000.  The higher ranked player invariably wins, and results are rated sequentially by linear ratings.  At the conclusion of each repetition an error statistic is calculated for all pairings in the field, namely, the root mean square of the deviation of actual relative performance, W - L, from expected relative performance based on the generated ratings.