TY - JOUR
T1 - Property of average precision and its generalization
T2 - An examination of evaluation indicator for information retrieval experiments
AU - Kishida, Kazuaki
PY - 2005/9/29
Y1 - 2005/9/29
N2 - In information retrieval experiments, indicators for measuring effectiveness of the systems or methods are important. Average precision is often used as an indicator for evaluating ranked output of documents in standard retrieval experiments. This report examines some properties of this indicator. First, we clarify mathematically that relevant documents at a higher position in the ranked list contribute much more to increasing the score of average precision. Second, influence of detecting unknown relevant documents on the score is discussed. Third, we examine statistical variation of average precision scores caused by fluctuations of results from relevance judgments. Another issue of this report is to explore evaluation indicators using data of multi-grade relevance. After reviewing indicators proposed by other researchers, i.e., modified sliding ratio, normalized discounted cumulative gain (nDCG), Q-measure and so on, a new indicator, generalized average precision is developed. We compare these indicators empirically using a simple and artificial example.
AB - In information retrieval experiments, indicators for measuring effectiveness of the systems or methods are important. Average precision is often used as an indicator for evaluating ranked output of documents in standard retrieval experiments. This report examines some properties of this indicator. First, we clarify mathematically that relevant documents at a higher position in the ranked list contribute much more to increasing the score of average precision. Second, influence of detecting unknown relevant documents on the score is discussed. Third, we examine statistical variation of average precision scores caused by fluctuations of results from relevance judgments. Another issue of this report is to explore evaluation indicators using data of multi-grade relevance. After reviewing indicators proposed by other researchers, i.e., modified sliding ratio, normalized discounted cumulative gain (nDCG), Q-measure and so on, a new indicator, generalized average precision is developed. We compare these indicators empirically using a simple and artificial example.
UR - http://www.scopus.com/inward/record.url?scp=33749530226&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33749530226&partnerID=8YFLogxK
M3 - Review article
AN - SCOPUS:33749530226
SN - 1346-5597
VL - 2005
SP - 1
EP - 19
JO - NII Technical Reports
JF - NII Technical Reports
IS - 14
ER -