Font Size: a A A

The Extraction Of Synonyms And Hyponyms Based On Multi-resources And Their Application In Chinese Names Disambiguation

Posted on:2015-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:Q H FanFull Text:PDF
GTID:2298330431996181Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Lexical semantic relations is one of the vital research in natural languageprocessing, lexical semantic relations is not only a basic resource base in buildingsemantic knowledge, but also has a very important role in the field of informationretrieval, machine translation, emotional analysis. Lexical semantic relations is toestablish logical relationships between words in a semantic category, its mainrelations, including the hypernym-hyponym relations and synonymous relations,thispaper mainly extracts synonyms and hyponym. Synonyms mainly refer to theemotional tone and vocabulary words without considering, one or more of the sameor similar meaning can be expressed and can be interchangeable words orphrases.Hyponym that their semantics contained in another vocabulary (hypernym)vocabulary connotation, which it is a special instance of the hypernym.In thispaper,the main work is as follows:1) From the perspective of Chinese words to find the semantic relationships wordsynonyms and hyponym extraction is proposed combining multi-resource semanticnetwork resources such as dictionaries and lexical semantic relation extraction. Oneof synonyms extraction, first of all, it is based on the structural characteristics ofChinese semantic dictionary, according to Jaccard algorithm for synonym extraction,followed by extraction from the Encyclopedia entry, translations and other networkresources through a rule-based approach, finally which is based on the terms of theircharacteristics Synonyms composite structure; extraction to hypernym, the firstconcept of combining Chinese dictionary, followed by the network resources, on theone hand combined with "open category" of Baidu Encyclopedia and Wikipedia,Hudong encyclopedia of "classification" and encyclopedia resource, and BaiduSearch terms. Through the analysis of data extraction on the part of synonymsand hyponyms, establishing noise data of a series of synonymsand hyponyms, noise filtering of synonyms and hyponyms, generatingcandidate synonyms and hyponyms set.2) We filter the candidate set of synonyms and hyponyms based on optimization. Inthis paper, we turn the optimization the optimization of synonyms and hyponymsfiltered into the problem of text classification, feature extraction is the primary taskfor text classification, we use statistical methods for feature extraction,respectivelyJaccard, mutual information, chi-square test number, vocabulary, vocabulary ofco-occurrence of minimum distance between, between vocabulary for informationsuch as number of key feature classification, and then use the support vector machine(SVM) and the maximum entropy model to optimize the synonyms andhyponyms.Experimental results show that the features of mutual information, chi-square test, the number of co-occurrence of words, the minimum distancebetween pairs of words, the number of features, vocabulary words between thecharacteristics of the individual have a greater advantage than the literal similarity,support vector machine is better than the maximum entropy model for synonym andhyponym extraction.3) Finally, this paper describes the application of synonym and hyponym in terms ofChinese names disambiguation. According to figures and characters alias identityinformation extraction, and character work, study units, organizations and groups,proper nouns, living location and other characteristics, then the identity and characterof the right to re-alias weighted separately using vector cosine and the specificparameters for Chinese names of disambiguation, experimental results show thatsynonym and hyponym be effective for Chinese names disambiguation.
Keywords/Search Tags:Baidu Encyclopedia, Synonyms, Hyponym, Names Disambiguation, Support Vector Machines, Maximum Entropy
PDF Full Text Request
Related items