TY - GEN
T1 - An automatic sameAs link discovery from Wikipedia
AU - Kagawa, Kosuke
AU - Tamagawa, Susumu
AU - Yamaguchi, Takahira
PY - 2014
Y1 - 2014
N2 - Spelling variants of words or word sense ambiguity takes many costs in such processes as Data Integration, Information Searching, data pre-processing for Data Mining, and so on. It is useful to construct relations between a word or phrases and a representative name of the entity to meet these demands. To reduce the costs, this paper discusses how to automatically discover "sameAs" and "meaningOf" links from Japanese Wikipedia. In order to do so, we gathered relevant features such as IDF, string similarity, number of hypernym, and so on. We have identified the link-based score on salient features based on SVM results with 960,000 anchor link pairs. These case studies show us that our link discovery method goes well with more than 70% precision/ recall rate.
AB - Spelling variants of words or word sense ambiguity takes many costs in such processes as Data Integration, Information Searching, data pre-processing for Data Mining, and so on. It is useful to construct relations between a word or phrases and a representative name of the entity to meet these demands. To reduce the costs, this paper discusses how to automatically discover "sameAs" and "meaningOf" links from Japanese Wikipedia. In order to do so, we gathered relevant features such as IDF, string similarity, number of hypernym, and so on. We have identified the link-based score on salient features based on SVM results with 960,000 anchor link pairs. These case studies show us that our link discovery method goes well with more than 70% precision/ recall rate.
KW - Disambiguation
KW - Ontology
KW - SameAs link
KW - Spelling variants
KW - Synonym
KW - Wikipedia
UR - http://www.scopus.com/inward/record.url?scp=84902586651&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84902586651&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-06826-8_29
DO - 10.1007/978-3-319-06826-8_29
M3 - Conference contribution
AN - SCOPUS:84902586651
SN - 9783319068251
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 399
EP - 413
BT - Semantic Technology - Third Joint International Conference, JIST 2013, Revised Selected Papers
PB - Springer Verlag
T2 - 3rd Joint International Semantic Technology Conference, JIST 2013
Y2 - 28 November 2013 through 30 November 2013
ER -