Problems ======== Let's call $f$ a ranking measure if it assigns a numeric score to each item $i$ of a database $D$. Larger scores indicate superior items. How can we say that a ranking measure $g$ is better than $f$? Consider the evaluation in the TotalRank paper, which is an alternative to PageRank for ranking the nodes of a directed graph. The measures, PageRank and TotalRank, are compared via: i) the positions of the top pages selected by TotalRank ii) the Kendall tau measure iii) the intersection similarity measure. This comparison raises two questions. How should we compare ranking measures? ---------------------------------------- As stated above, a common technique to evaluate a ranking is to examine the list of the most highly ranked items. Clearly, this measure, by itself, is insufficient. Thus, most authors also use Kendall's tau to compare the ordering induced by the two measures $f$ and $g$. Of course, there are myriad other measures too: Spearman's footrule distance, weighted measures, etc. (see Dwork et al. for a small list). Another approach is to consider using the ranking measure as input to another application. An improved application specific performance report is then taken to indicate that the ranking measure is better. e.g. "Using an anonymous user-level dataset of sponsored search campaigns for eight different advertisers, we evaluate our framework with different adgraphs and adfactors in terms of their statistical fit to the data, and show its value for mining the long-term behavioral patterns in the advertising data." (Archak et al. 2010) Also, see Liben-Nowell and Kleinberg 2006 for an analysis of ranking measures for link prediction. While it is hard to debate the application specific value of the latter approach, it often depends on access to vast quantities of validation data -- data that may not be generally available. Are these approaches sufficient? Can we do better? How _should_ we compare ranking measures? What properties do we want in comparisons? As an aside, there is a litany of ``centrality'' measures on the nodes of a graph: degree, PageRank, HITS Authority scores, eccentricty scores, betweenness centrality, Katz scores, etc. These may serve as a ready source of ranking measures to compare. Is Kendall's tau a good measure for large ranking datasets? ----------------------------------------------------------- A common objection to Kendall's tau metric is that it is inappropriate for large databases as it weights all transpositions equally. (See Boldi et al. for an example of this comment.) While this is true, a single transposition of $n$ places implies $n-1$ other transpositions as well, and thus the measure must ``pay'' for all of these transpositions. Thus we ask, is Kendall's tau an appropriate measure to compare large ranking or not? Related questions are: - When is a Kendall tau difference meaningful on a big database that is likely to have have many large but non-meaningful transpositions? Can we determine when a dataset will product a good ranking? ------------------------------------------------------------ Given a set of pairwise comparisons, can we tell if the items are amendable to ranking? It's clear that many algorithms will produce a ranking given any dataset. Residual method proposed by Jiang et al. provide some certificate that a dataset is rankable, but depend on a set of pairwise comparisons between items --- something that may not be available. What else can we do if the input is a graph, for instance. Is a cycle rankable? Is a clique? It seems like tree should be rankable somehow. One idea is: Can we adopt any ideas from feedback arc sets to help here? Jiang, X.; Lim, L.-H.; Yao, Y. & Ye, Y. Statistical Ranking and Combinatorial Hodge Theory arXiv, 2008, 0811.1067 Archak, N.; Mirrokni, V. S. & Muthukrishnan, S. Mining advertiser-specific user behavior using adfactors WWW '10: Proceedings of the 19th international conference on World wide web, ACM, 2010, 31-40 doi 10.1145/1772690.1772695 Boldi, P. TotalRank: Ranking Without Damping Poster Proceedings of the 14th international conference on the World Wide Web (WWW2005), 2005, 898-899 Fagin, R.; Kumar, R. & Sivakumar, D. Comparing Top k Lists SIAM Journal of Discrete Mathematics, 2003, 17, 134-160 doi 10.1137/S0895480102412856 Liben-Nowell, D. & Kleinberg, J. The link-prediction problem for social networks Journal of the American Society of Information Science and Technology, 2006, 58, 1019-1031 doi 10.1002/asi.20591 Dwork, C.; Kumar, R.; Naor, M. & Sivakumar, D. Rank aggregation methods for the Web WWW '01: Proceedings of the 10th international conference on World Wide Web, ACM, 2001, 613-622 doi 10.1145/371920.372165