In information retrieval experiments, indicators for measuring effectiveness of the systems or methods are important. Average precision is often used as an indicator for evaluating ranked output of documents in standard retrieval experiments. This report examines some properties of this indicator. First, we clarify mathematically that relevant documents at a higher position in the ranked list contribute much more to increasing the score of average precision. Second, influence of detecting unknown relevant documents on the score is discussed. Third, we examine statistical variation of average precision scores caused by fluctuations of results from relevance judgments. Another issue of this report is to explore evaluation indicators using data of multi-grade relevance. After reviewing indicators proposed by other researchers, i.e., modified sliding ratio, normalized discounted cumulative gain (nDCG), Q-measure and so on, a new indicator, generalized average precision is developed. We compare these indicators empirically using a simple and artificial example.
|ジャーナル||NII Technical Reports|
|出版ステータス||Published - 2005 9月 29|
ASJC Scopus subject areas
- コンピュータ サイエンスの応用