TY - GEN
T1 - Technical term recognition with semi-supervised learning using hierarchical bayesian language models
AU - Fujii, Ryo
AU - Sakurai, Akito
PY - 2012
Y1 - 2012
N2 - To recognize technical term, term dictionaries or tagged corpora are required, but it will take much cost to compile them. Moreover, the terms may have several representations and new terms may be developed, which complicates the problem further, that is, a simple dictionary building can't solve the problem. In this research, to reduce the cost of creating dictionaries, we aimed at building a system that learns to recognize terminology from small tagged corpus using semi-supervised learning. We solved the problem by combining a tag level language model and a character level language model based on HPYLM. We performed experiments on recognition of biomedical terms. In supervised learning, we achived 65% F-measure which is 8% points behind the best existing system that utilizes many domain specific heuristics. In semi-supervised learning, we could keep the accuracy against reduction of supervised data better than exisiting methods.
AB - To recognize technical term, term dictionaries or tagged corpora are required, but it will take much cost to compile them. Moreover, the terms may have several representations and new terms may be developed, which complicates the problem further, that is, a simple dictionary building can't solve the problem. In this research, to reduce the cost of creating dictionaries, we aimed at building a system that learns to recognize terminology from small tagged corpus using semi-supervised learning. We solved the problem by combining a tag level language model and a character level language model based on HPYLM. We performed experiments on recognition of biomedical terms. In supervised learning, we achived 65% F-measure which is 8% points behind the best existing system that utilizes many domain specific heuristics. In semi-supervised learning, we could keep the accuracy against reduction of supervised data better than exisiting methods.
UR - http://www.scopus.com/inward/record.url?scp=84863997707&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863997707&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-31178-9_42
DO - 10.1007/978-3-642-31178-9_42
M3 - Conference contribution
AN - SCOPUS:84863997707
SN - 9783642311772
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 327
EP - 332
BT - Natural Language Processing and Information Systems - 17th International Conference on Applications of Natural Language to Information Systems, NLDB 2012, Proceedings
T2 - 17th International Conference on Applications of Natural Language to Information Systems, NLDB 2012
Y2 - 26 June 2012 through 28 June 2012
ER -