12. A Test

 

 

Tests of a rating system, such as those offered by Elo is his main work [E1, 2.6],  tend not to be worth the paper they are written on, mainly because a rating is a measure of playing strength only in a metaphorical sense.  Ratings are statistics, and the predictions they offer in this respect are self-fulfilling.  A test of arithmetic averaging, by analogy, to determine whether it yields central values would be pointless.  Ratings do not predict changes in playing strength: whether, for instance, a particular twelve-year-old will remain a novice all his/her life or become the next Bobbie Fischer.  Ratings assume, rather, that the playing strength exhibited by past results will be the playing strength exhibited by future results.  Their predictions are nothing more than an extrapolation of demonstrated playing strength. 

Rather than pursue the topic announced in the title, it may be more profitable to look at  rating statistics in the abstract.  A less invidious simulation of the various rating systems is taken up under Rating by Single GamesLet us designate rating differences or ratios as d.  Then for any rating system

[12.1]                     P(d)  =  f (d) ,

which is to say, percentage expectancy is a function of rating difference or ratio.  Not just any function will do if efficiency is a consideration, but we will keep our discussion general.  There is also the inverse function,

[12.2]                      d  =  f -1 [P(d)] .

Ratings are determined from particular instances of d,  which in turn are determined from actual percentage scores.

[12.3]                       da  =  f -1 (P)  .

For an interval scale

[12.4]                        R  =  Rc +  da ,

and for a ratio scale

[12.5]                        R  =  Rc .  da

Once a rating is determined, it may be viewed in relation to another instance of d, call it db.  The percentage expectancy associated with this new instance is determined from [12.1] as

[12.6]                        P(db)  =  f (db) .

A rating change can be determined from the change in db, which in turn is determined from the change in P(db).  From [12.2] it follows that

[12.7]                         db + Ddb  =  f -1[P(db) + DP(db)]

and

[12.8]                         DR  =  Ddb .

These formulas apply to rating systems in general, though a rating system may not require all of them.  A rating system can manage well enough using only [12.3] through [12.5], which express the general idea of a performance formula, thus forsaking the notion of percentage expectancy.  Especially noteworthy is the argument of the inverse function in [12.7].  This new percentage expectancy comes about as a consequence of a new result.  Recalling the cumulative form of arithmetic averaging in Methods of Calculation,

[11.4]                         DP  =  Pn - Pe  =  (W - NPo) / (N+ N) ,

where the new result is W points out of N games.  This formula in its original context serves as a description of cumulative averaging.  We have made the additional assumption that percentage scores in a rating system tend toward a limit.  The original percentage score in this light is an estimate of probability and is consequently written as the percentage expectancy Pe.  The new percentage score Pn is accordingly a revised estimate of probability.