3. Ordinal Ratings

 

     

   

Ratings that are based on an ordinal scale typically take the form of ranks, though other forms are admissible, e.g. percentiles.  The process of ranking is based on the dominance relation, indicated in a dominance graph by an arrow or directed edge drawn from the dominant to the subordinate player or node.  Dominance in a competitive setting means prevailing over an opponent by some established criterion, such as a win or winning percentage.  A competing field is represented by a complete dominance graph when all pairings are accounted for.  In a partial dominance graph there are edges missing, corresponding to pairings for which there are no decisive results.  The classic ranking problem is to assign ranks to the nodes of a dominance graph so as to minimize violations, in this case dominance by lower-ranking players. The problem has been shown to be NP-complete [GJ]. That is, barring an earth-shaking breakthrough, there is simply no ranking algorithm to be discovered that is both exact and efficient for data of significant quantity. Rankings of a dozen or so players can be optimized by brute force, but as the number of ranks increases, the degree of complexity increases exponentially.  Finding a ranking that minimizes violations soon becomes impractical, even for a high-speed computer. 

For some NP-complete problems, difficulties can be overcome by restating the conditions of the problem, though at the risk of trivializing it. It turns out that there is an efficient algorithm for ranking so as to minimize the algebraic sum of directed rank differences. A directed (signed) difference in rank is taken from the rank of the subordinate player to the rank of the dominant player as 

            rank (dominant) - rank (subordinate) .

For expected results the direction is negative since the higher ranks are smaller. Such results are desirable from the standpoint of ranking because they tend to minimize directed rank differences. For upsets the direction is positive. The algorithm for minimizing the algebraic sum of directed differences in a ranking consists simply of sorting the players in descending order according to the difference

               Wd - Ld

with respect to each player, where Wd is the number of opponents dominated by the player and Ld is the number of opponents who dominate the player. (The notation suggests "wins" and "losses," but only as a mnemonic device. The subscript indicates dominance.  The equivalent in a dominance graph is outdegree minus indegree.)  The algorithm should be applied as a tiebreaker whenever possible, in which case the dominance relations among players with equal ranks are considered.  

It is difficult to prove the efficiency of sorting by Wd - Ld to minimize directed rank differences without being long-winded.  A wordy sketch will suffice for the present purpose.  Note that when a player moves up in rank, the sum of his directed rank differences is decremented for each player dominated by him (that is, the sum is decreased by Wd) and is incremented for each player who dominates him (that is, the sum is increased by Ld).  Similarly, when a player moves down in rank, the sum of his directed rank differences increases by Wd and decreases by Ld.  For any adjacent pair of players, A and B, where A is the higher ranked, if

               Wd(A) - Ld(A)  <  Wd(B) - Ld(B) ,

a swap will produce an overall decrease in directed rank difference.  By an appropriate sequence of swaps, a ranking can be produced that minimizes the algebraic sum of directed rank differences.  This is perhaps best seen by working out an actual case.

A statistic related to Wd - Ld is dominance percentage Pd, defined as the percentage of the playing field dominated by the player in question, plus half the percentage which stands in no dominance relation with that player (either dominant or subordinate).  The definition may be generalized to include the player himself, but this serves no useful purpose.  It is convenient to think of dominance percentage as the percentage score that would result from a single game against other players in the competing field, assuming a draw against any player not actually encountered.  If n is the total number of potential opponents in the field,

[3.1]          Pd  =  (Wd + .5Dd) / n,

where Dd is the number of opponents who do not stand in a dominance relation to the player in question. 

Wd - Ld is a linear transformation of Pd, which may be demonstrated as follows:  From the definition of dominance percentage,

[3.2]         Wd  =  nPd - .5Dd

and, by complement,

[3.3]          Ld  =  n - Wd - Dd .

Substituting [3.2] into [3.3] ,

[3.4]          Ld  =  n - nPd - .5Dd

and subtracting [3.4] from [3.2],

[3.5]          Wd - Ld   =  2n (Pd) - n , 

which is the linear transformation claimed.  Sorting the field by Pd is therefore equivalent to sorting the field by Wd - L and has the same effect in minimizing directed rank differences.

Sorting by Wd - Ld and Pd does not minimize violations but is a useful algorithm for reducing violations to a near minimum..  A simulation by the present author pits this algorithm against a brute-force method (Downloads).  It produces a sequence of one hundred round-robin tournaments, each consisting of ten players.  The players may be judged to be of equal strength inasmuch as every dominance relation is decided by the equivalent of a coin toss.  For each tournament, first, the players were ranked by the dominance statistics developed here, using tiebreakers by the same method where possible, and the total number of violations produced by this ranking was noted as T1.  Second, a ranking was found by a brute-force method that minimized violations, and the resulting number of violations was noted as T2.  The upshot was an array of one hundred values for T1 and a corresponding array of one hundred values for T2.  The mean value for T1 was 11.75,  and for T2 it was 9.3.  Considering the onerous nature of the brute-force method, there was not a great deal of improvement over the method based on dominance statistics.  A method similar to the latter was shown by the author to be more efficient in reducing total violations, at least in random simulations, than more sophisticated probability algorithms [J, 2.2].        

The principle of minimizing directed rank differences explains the efficiency of a round-robin tournament, which has long been regarded as a rational means for ordering the performances of its participants.  The percentage scores resulting from a round robin, including half points for draws, are equivalent to dominance percentages, and their descending order consequently minimizes directed rank differences.  The principle may be extended to partial tournaments.  An interesting illustration is the Mannheim 1914 tournament, an 18-player round-robin that was abandoned after 11 rounds with the outbreak of the First World War [HW, "auxiliary scoring methods"].  Alekhine then had the best score with 9.5 points.  It has been argued that Vidmar with 8.5 points actually outperformed Alekhine because he had met stronger opposition, which is supported by several auxiliary scoring methods, including Neustadtl and Solkoff.  Sorting the field by Pd gives the nod to Alekhine.  The dominance percentage for Alehine was 12.5/17, and for Vidmar 11.5/17.

The dominance percentage statistic suggests a natural ordering of performances, expressed by the relation

[3.6]          Q  @  Pd ,

where Q is the quantile (e.g., percentile) position of the particular value Pd in the competing field, which is almost never exact. The relation is analogous to the basic formulas of more advanced interval and ratio rating systems, to be introduced under subsequent headings.  It motivates a ranking algorithm in which quantile estimates are adjusted as outcomes occur.  In this sequential ranking system, a list of opposition quantiles is maintained in ascending order for each player, say the quantiles of his/her last n opponents.  Using the player's point score against the current roster of opponents, a quantile rank based on [3.6] may be estimated.  Suppose a player has in the list of n opposition ranks the quantile values

             Q1, Q2, Q3, . . . , Qn-1, Q

                   such that  Qi  <=  Qi+1

and that he/she has scored s points against the n opponents.  The player's quantile would be calculated to fall above Qs, but below Qs+1, perhaps as the mean of these two values.  The calculated quantile is then used in subsequent estimates of opposition quantiles, and the algorithm proceeds in self-corrective fashion. 

This scheme is implemented in a program available by download from ftp.rathingtheory.com in the folder "Ordinal Ratings 2."  The program generates outcomes based on an arbitrary ranking of 400 players in 25 cycles of 200 games each, but users may change the experimental parameters.  Results are calculated in the form of quantiles, which are then used to calculate rankings at the end of each cycle.  The correlations between arbitrary rankings and calculated rankings are depicted in the graph below.  It appears that a small list of opposition quantiles, three or four elements for each player instance, is adequate for generating plausible correlations.  The output depicted is based on four elements in each list.  Note that generating rankings from scratch presents subtle problems for the programmer.  These are largely overcome by assigning initial quantiles as random values.  

Correlation of Calculated Ranks with Predefined Ranks
 by Spearman's Rank Correlation Coefficient