Problems
========

Let's call $f$ a ranking measure if it assigns a numeric score to each 
item $i$ of a database $D$.  Larger scores indicate superior items. How 
can we say that a ranking measure $g$ is better than $f$?  Consider the 
evaluation in the TotalRank paper, which is an alternative to PageRank for 
ranking the nodes of a directed graph.  The measures, PageRank and 
TotalRank, are compared via:

  i) the positions of the top pages selected by TotalRank
  ii) the Kendall tau measure
  iii) the intersection similarity measure.

This comparison raises two questions.

How should we compare ranking measures?
----------------------------------------

As stated above, a common technique to evaluate a ranking is to examine 
the list of the most highly ranked items.  Clearly, this measure, by 
itself, is insufficient.  Thus, most authors also use Kendall's tau to 
compare the ordering induced by the two measures $f$ and $g$.  Of course, 
there are myriad other measures too: Spearman's footrule distance, 
weighted measures, etc. (see Dwork et al. for a small list).

Another approach is to consider using the ranking measure as input to 
another application.  An improved application specific performance report 
is then taken to indicate that the ranking measure is better. e.g. "Using 
an anonymous user-level dataset of sponsored search campaigns for eight 
different advertisers, we evaluate our framework with different adgraphs 
and adfactors in terms of their statistical fit to the data, and show its 
value for mining the long-term behavioral patterns in the advertising 
data."  (Archak et al. 2010) Also, see Liben-Nowell and Kleinberg 2006 for 
an analysis of ranking measures for link prediction.

While it is hard to debate the application specific value of the latter 
approach, it often depends on access to vast quantities of validation data 
-- data that may not be generally available.

Are these approaches sufficient?  Can we do better?  How _should_ we 
compare ranking measures?  What properties do we want in comparisons?

As an aside, there is a litany of ``centrality'' measures on the nodes of 
a graph: degree, PageRank, HITS Authority scores, eccentricty scores, 
betweenness centrality, Katz scores, etc.  These may serve as a ready 
source of ranking measures to compare.

Is Kendall's tau a good measure for large ranking datasets?
-----------------------------------------------------------

A common objection to Kendall's tau metric is that it is inappropriate for 
large databases as it weights all transpositions equally.  (See Boldi et 
al. for an example of this comment.) While this is true, a single 
transposition of $n$ places implies $n-1$ other transpositions as well, 
and thus the measure must ``pay'' for all of these transpositions. Thus we 
ask, is Kendall's tau an appropriate measure to compare large ranking or 
not?

Related questions are:

- When is a Kendall tau difference meaningful on a big database that is 
likely to have have many large but non-meaningful transpositions?

Can we determine when a dataset will product a good ranking?
------------------------------------------------------------

Given a set of pairwise comparisons, can we tell if the items are 
amendable to ranking?  It's clear that many algorithms will produce a 
ranking given any dataset.

Residual method proposed by Jiang et al. provide some certificate that a 
dataset is rankable, but depend on a set of pairwise comparisons between 
items --- something that may not be available. What else can we do if the 
input is a graph, for instance.  Is a cycle rankable?  Is a clique?  It 
seems like tree should be rankable somehow.  One idea is: Can we adopt any 
ideas from feedback arc sets to help here?

Jiang, X.; Lim, L.-H.; Yao, Y. & Ye, Y. Statistical Ranking and 
Combinatorial Hodge Theory arXiv, 2008, 0811.1067

Archak, N.; Mirrokni, V. S. & Muthukrishnan, S. Mining advertiser-specific 
user behavior using adfactors WWW '10: Proceedings of the 19th 
international conference on World wide web, ACM, 2010, 31-40 doi 
10.1145/1772690.1772695

Boldi, P. TotalRank: Ranking Without Damping Poster Proceedings of the 
14th international conference on the World Wide Web (WWW2005), 2005, 
898-899

Fagin, R.; Kumar, R. & Sivakumar, D. Comparing Top k Lists SIAM Journal of 
Discrete Mathematics, 2003, 17, 134-160 doi 10.1137/S0895480102412856

Liben-Nowell, D. & Kleinberg, J. The link-prediction problem for social 
networks Journal of the American Society of Information Science and 
Technology, 2006, 58, 1019-1031 doi 10.1002/asi.20591

Dwork, C.; Kumar, R.; Naor, M. & Sivakumar, D. Rank aggregation methods 
for the Web WWW '01: Proceedings of the 10th international conference on 
World Wide Web, ACM, 2001, 613-622 doi 10.1145/371920.372165