Font Size: a A A

Research On Chinese Hyponymy Relation Extraction And Application

Posted on:2016-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:W J SongFull Text:PDF
GTID:2308330464964480Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Knowledge acquisition is a crucial issue in artificial intelligence research. It is attracted much attention especially in Semantic Relation Extraction(SRE) area. Synonymy, hyponymy, whole-part and causality are common relations in SRE. Among these relations, hyponymy is basic and important,regularly used in construction of dictionary and knowledge base. The rapid development of the Internet has brought a large volumn of new vocabularies into existence and people tend to endow conventional expressions with novel meanings as well. These changes pose a serious challenge to the development of the traditional semantic dictionary, which can not fully satisfy the requirements of application in the Natural Language Processing(NLP) research field.This thesis focuses on the research on Chinese hyponymy relation extraction and application to update the scale and improve the quality of the traditional dictionary. The ultimate aim is to improve the processing capacity of the semantic dictionary and serve all kinds of NLP tasks well. The following three parts are covered in this thesis.(1)Large scale extraction of hyponymy relation. Two kinds of hyponymy extraction strategies are proposed, including dictionary based strategy and encyclopedia based strategy, in order to build a sophisticated hyponymy knowledge base. Chinese Concept Dictionary(CCD) and Chinese Classified Subject Thesaurus(CCST) are applied as dictionary resources. Manual regex is used to extract hyponym from Wikipedia, baidubaike and hudongbaike based on the consisting of web addresses. Extensive experimental evaluation conducted on these resouces demonstrates that these strategies outperform the previous evaluation results. Moreover, the linguistic facts in the corpus is analyzed, using algorithms based on "is a" pattern.(2)Automatic verification of hyponymy relations. Firstly, we analyze the hyponymy relations based on dictionary method, and use single-character-based similarity algorithm to verify the hyponymy relations. Secondly, the method of information retrieval similarity is discussed, according to the fact that hyponymy relations co-occurrence is ubiquitous in search engines. Setting high similarity threshold to filter potential errors greatly lower recall rate, which affects the scale of candidate hyponym set. Thus, an algorithm which combines these two kinds of methods with word embedding model is proposed to overcome the common limitation of the two methods mentioned above. The experimental result shows that the method can effectively improve the precision and recall of the automatic verification of hyponymy relations. Finally, we manually annotate the hyponymy relation set to further improve the quality of semantic dictionary.(3)Applications of the semantic dictionary. The conference on Natural Language Processing and Chinese Computing(NLP&CC) is an annual conference of Technical Committee of Chinese Information, China Computer Federation(CCF TCCI). Natural Language Processing Lab of Nanjing Normal University participated in the Lexical Semantic Relation Evaluation(LSRE) held by NLP&CC in 2012. They proposed a multi-strategy Chinese synonyms extraction method and won the first prize in the review. Together with the method of hyponymy relation extraction proposed in this thesis, two kinds of semantic dictionaries are built:GKB_SYN(synonymy) and GKB_HYP(hyponymy) for the noun part of The Grammatical Knowledge-base of Contemporary Chinese(GKB). The experiment in Fudan-classification-corpus indicates that GKB_SYN improves the text categorization effectively than Cilin. The experiment in People’s Daily corpus shows GKBSYN and GKB_HYP improve Word Sense Tagging(WST) result intuitively than GKB from token and word type aspect.
Keywords/Search Tags:Hyponymy, Semantic dictionary, Encyclopedia, Word Embedding, Text Classification
PDF Full Text Request
Related items