TY - JOUR
T1 - Translation disambiguation for cross-language information retrieval using context-based translation probability
AU - Kishida, Kazuaki
AU - Ishita, Emi
N1 - Funding Information:
This work was funded by Frankfurt Zoological Society and the Chicago Zoological Society. I wish to thank Tanzanian National Parks, the Serengeti Wildlife Research Institute, and COSTEC for their cooperation, support and permission to work within Tanzanian parks. For field support, I thank G.M. Bigurube, A. Kyambile and other wardens and staff members of Katavi National Park, and M. Borner. T. Caro, D. Kelt, P. Gros, B. Hamilton, A. Ortolani, T. Palmer and J. Wolff provided insight and helpful comments on earlier versions. Thanks to S. Blaffer Hrdy, J. Hoogland, and L. Trulio for critical reading of the manuscript.
PY - 2009/8
Y1 - 2009/8
N2 - Disambiguation between multiple translation choices is very important in dictionary-based cross-language information retrieval. In prior work, disambiguation techniques have used term co-occurrence statistics from the collection being searched. Experimentally these techniques have worked well but are based upon heuristic assumptions. In this paper, a theoretically grounded alternative is proposed, one which uses sense disambiguation based upon context terms within the source text. Specifically this paper introduces the concept of translation probabilities incorporating a context term and extends the IBM Model 1 for estimating context-based translation probabilities from a sentence-aligned bilingual corpus. Experimental results in English to Italian bilingual searches show significant performance improvement of the context-based translation probabilities over the case without any disambiguation.
AB - Disambiguation between multiple translation choices is very important in dictionary-based cross-language information retrieval. In prior work, disambiguation techniques have used term co-occurrence statistics from the collection being searched. Experimentally these techniques have worked well but are based upon heuristic assumptions. In this paper, a theoretically grounded alternative is proposed, one which uses sense disambiguation based upon context terms within the source text. Specifically this paper introduces the concept of translation probabilities incorporating a context term and extends the IBM Model 1 for estimating context-based translation probabilities from a sentence-aligned bilingual corpus. Experimental results in English to Italian bilingual searches show significant performance improvement of the context-based translation probabilities over the case without any disambiguation.
KW - Cross-language information retrieval
KW - Parallel corpora
KW - Translation probability
KW - Word sense disambiguation
UR - http://www.scopus.com/inward/record.url?scp=68249147415&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=68249147415&partnerID=8YFLogxK
U2 - 10.1177/0165551509103599
DO - 10.1177/0165551509103599
M3 - Article
AN - SCOPUS:68249147415
SN - 0165-5515
VL - 35
SP - 481
EP - 495
JO - Journal of Information Science
JF - Journal of Information Science
IS - 4
ER -