6. Sequential vs. Simultaneous Ratings

 

   

Rating systems in chess typically maintain a pool of ratings and apply their formulas to contests, either individual games or tournaments, as they occur among the rated players. This straightforward approach, producing sequential ratings, is not without its problems. 

  • Although the emphasis is on up-to-date ratings, the underlying data are an assortment of old and new.

  • Data samples vary in size, from the few games of the occasional player to the many games of the enthusiast, with resulting variation in sampling error.

  • Ratings may be manipulated by players using various methods, such as "sandbagging."

  • While some ratings are more or less stable, others are rising rapidly, and their interaction causes deflation in the pool at large.

  • Finally, the rating pool itself is changing as players come and go.

The minimizing effect of the rating process will eventually make itself felt in a sequential system, but the effect on individual ratings in the meantime could be calamitous.  The Elo System attempts to keep ratings up-to-date by limiting sample size in its established rating, which becomes a kind of moving average, described more fully under Established Ratings. Unlike ordinary moving averages, where sample size is restricted to the last N games, established ratings are based on attenuated sample weight. The effect of a rated event having an original sample weight of 1/N becomes attenuated as more and more data are processed.  In theory at least, the effect of an initial sample is never completely lost. The established rating becomes what might be called a weighted moving average. Although recent data are more heavily weighted, rating changes emerging from the averaging process do not keep pace with changes in playing strength.  Timely adjustments, in short, do not guarantee currency of the data on which they are based.

The disadvantages of sequential ratings may be overcome by applying simultaneous calculations to a defined data set, using either matrix algebra or an iterative process.  In 1969 such an application produced the first International Rating List, using a computer to make iterative calculations on the complete interplay of 210 contestants over the previous three years [E2, Part 3].  The resulting list was a self-consistent set of ratings with clearly defined boundaries for its data, which could then be used for sequential calculations. The downside of this approach is that for large systems it requires enormous computing resources, but with the computer power now available it is the rating method of choice.

There is a simple modification of sequential ratings that seems intuitively to be a first step towards simultaneous calculations.  Notice that rating adjustments between pairs of contestants are based on pre-event ratings, so that the adjustment of a rating is mirrored in the opponent's rating.  Since two adjustments are redundant it seems logical to halve each one for linear ratings.  This suggests an experiment in which the efficiency of linear adjustments for a defined set of outcomes is compared to the efficiency of the same adjustments by half.  Efficiency in this context refers to correlations between rating differences and relative performance in the form of score differences, W - L.  The rough and ready simulation summarized in the graph below performs this experiment on a ranking of 100 players for 800 random pairings in stages of 100 (Downloads).  Results are rated sequentially with the higher ranked player invariably winning.  After each stage an error statistic is calculated for all pairings in the field, namely, the root mean square of the deviation of relative performance, W - L, from expected relative performance based on the generated ratings.  Initially the error statistic is lower for the half adjustments, but this advantage gradually disappears with the increase of pairings.  It seems, then, that this approach to simultaneous calculations has at best a temporary benefit.