|
3. Ordinal Ratings
|
|
|
|
Ratings that are based on an ordinal scale typically take the
form of ranks, though other forms, such as percentiles, are possible. The process of ranking is based on
the dominance relation, indicated in a dominance graph by
an arrow or directed edge drawn from the dominant to the
subordinate player or node. Dominance in a competitive
setting means prevailing over an opponent by some established criterion,
such as a win or winning percentage. A competing field is
represented by a complete dominance graph when all pairings are
accounted for. In a partial dominance graph there are edges
missing, corresponding to pairings for which there are no decisive
results. The classic ranking problem is to assign ranks to the
nodes of a dominance graph so as to minimize violations, in this
case dominance by lower-ranking players. The problem has been shown to
be NP-complete [GJ]. That is,
barring an earth-shaking discovery, there is no ranking
algorithm that is both exact and efficient for data of
any significant quantity. Rankings of a dozen or so players can be optimized
by brute force, but as the number of ranks increases the degree of
complexity increases exponentially. Finding a ranking that
minimizes violations soon becomes impractical, even for a high-speed
computer.
For some NP-complete problems, difficulties can be overcome by restating the conditions of the problem, though at the risk of trivializing it. It turns out that there is an efficient algorithm for ranking so as to minimize the algebraic sum of directed rank differences. A directed (signed) difference in rank is taken from the rank of the subordinate player to the rank of the dominant player as rank (dominant) - rank (subordinate) . For expected results the direction is negative since the higher ranks are smaller. Such results are desirable from the standpoint of ranking because they tend to minimize directed rank differences. For upsets the direction is positive. The algorithm for minimizing the algebraic sum of directed differences in a ranking is simply to sort the players in descending order according to the difference Wd - Ld with respect to each player, where Wd is the number of opponents dominated by the player and Ld is the number of opponents who dominate the player. The notation suggests wins and losses, but only as a mnemonic device. The subscript indicates dominance. The equivalent in a dominance graph is outdegree minus indegree. The algorithm is applied recursively as a tiebreaker among players of equal rank. It is difficult to prove the efficiency of sorting by Wd - Ld to minimize directed rank differences without being long-winded. A wordy sketch will suffice for the present purpose. Note that when a player moves up in rank, the sum of his directed rank differences is decremented for each player dominated by him (that is, the sum is decreased by Wd) and is incremented for each player who dominates him (that is, the sum is increased by Ld). Similarly, when a player moves down in rank, the sum of his directed rank differences increases by Wd and decreases by Ld. For any adjacent pair of players, A and B, where A is the higher ranked, if Wd(A) - Ld(A) < Wd(B) - Ld(B) , a swap will produce an overall decrease in directed rank difference.
By an appropriate sequence of swaps, a ranking can be produced that
minimizes the algebraic sum of directed rank differences. This is
perhaps best seen by working out an actual case. Wd - Ld is a linear transformation of Pd. From the definition of dominance percentage, [3.2] Wd = nPd - .5Dd , and by complement, [3.3] Ld = n - Wd - Dd . Substituting [3.2] into [3.3] , [3.4] Ld = n - nPd - .5Dd , and subtracting [3.4] from [3.2], Sorting by Wd - Ld or Pd does not minimize violations but is a useful algorithm for reducing them. In his master's thesis the author pitted this algorithm against a brute-force method in a computer simulation. The downloadable version produces a sequence of 100 all-play-all tournaments, each consisting of ten players (Downloads). The players are all of equal strength inasmuch as their odds of winning in any pairing are equal. First, for each tournament the players are ranked by the dominance statistics developed here, using recursive tiebreakers where possible. Second, using the same outcomes, a ranking for each tournament is found by a brute-force method that minimizes violations. The rankings by dominance yielded a mean of 11.75 violations over the 100 tournaments, while the rankings by brute force yielded a mean of 9.3. The reader may judge whether the minimization of violations was worth the application of an onerous brute-force algorithm. For larger tournaments the brute-force algorithm becomes impractical, in which case ranking by dominance statistics is the only recourse. As a demonstration of persistence, the author further showed ranking by dominance statistics to be more efficient in reducing total violations than some sophisticated probability algorithms of the day [J, 2.2]. The principle of minimizing directed rank differences explains the efficiency of an
all-play-all
tournament, which has long been regarded as a rational means for
ordering performances. The percentage
scores resulting from such a tournament, including half points for draws,
are equivalent to dominance percentages, and their descending order
consequently minimizes directed rank differences.
The principle may tentatively be extended to partial tournaments. An interesting illustration is the A more sophisticated treatment anticipates the discussion of interval ratings. For the moment we simply note that a simultaneous calculation of interval ratings (K = 1) for the Mannheim data gives a rating of .3769 for Alekhine's performance, and a rating of .3994 for Vidmar's. (The Excel program, Mannheim1914 under the heading "Ordinal Ratings 2," may be downloaded from ftp.rathingtheory.com.) This anomalous result is due to the fact that the data is from a partial tournament. For a complete (all-play-all) tournament performance formulas can be calculated from the average pre-event rating of the participants as [3.6] R = ERa + K(P - Pc) [(M-1) / M] for M participants. The average opposition rating, in effect, is the same for all participants. The only variables with respect to individual ratings are the player's percentage score and its complement, the opposition percentage score. Consequently, percentage scores completely determine the ordering of performances. (Formula [3.6], along with its proof, was attributed by Elo to Nida Uzman, "young Turkish chess player and a mathematics student in Berlin University" [E3].) The dominance percentage statistic provides a formula for ordinal ratings: [3.7] R = rank (Pd), which is analogous to the basic formulas of the more advanced rating systems to be introduced under subsequent headings. Ordinal ratings, thus defined, minimize directed rank differences. We have already seen their limitations in the treatment of the Mannheim data. Formula [3.7] can be applied all at once to a competing field. As with interval and ratio ratings, there is also a sequential algorithm. The algorithm assumes a competing field that has already been assigned ordinal ratings. For each player a list of opposition ratings is maintained in ascending order, say the ratings of the last n opponents that the player has encountered. Using the player's point score against the current list of opponents, the player's ordinal rating is calculated. Suppose a player has in the list of n opposition ranks the values R1, R2, R3, . . . , Rn-1, Rn in ascending order and that the player has scored s points against the n opponents. The player's ordinal rating would fall between Rsand Rs+1. The rating is then used in the calculation of opposition ratings, and the algorithm proceeds in self-corrective fashion. This scheme is implemented in a program available by download from ftp.rathingtheory.com in the folder "Ordinal Ratings 3." The program uses a ranking of 400 players in 25 cycles of 200 games each, but users may change the experimental parameters. The pre-assigned ranks represent the "true" playing strength of each player. Players are also assigned initial quantile values. Randomly distributed quantiles were found to work best. Pairings occur at random, and the optimistic assumption is made that the stronger player invariably wins. A small list of opposition quantiles, four for each player, proved adequate. At the end of each cycle the correlation between pre-assigned ranks and ranks based on quantiles was determined, as depicted in the graph below. The neat results are obviously due to the artificial conditions of the simulation, but the primary object was to demonstrate the feasibility of the system. It is doubtful, of course, that such an algorithm could ever be more than a toy rating system. Correlation of Calculated Ranks with Predefined Ranks
|