Font Size: a A A

Word Sense Disambiguation And Recognition Of Bilingual Word Collocation Based On Predicate Compatibility

Posted on:2019-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:T T ZhuFull Text:PDF
GTID:2428330545497832Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Word collocation is the habitual appearance of word combinations,with certain co-occurrence frequency,grammatical structure and semantic transparency.Word collocations play an important part in human language knowledge and occupy a very important position in natural language processing tasks.Researches on word collocation extraction and recognition contribute to machine translation,information retrieval,syntax analysis,word sense disambiguation and many other work of natural language processing.The existing methods of word collocation extraction and recognition can be divided into four types:methods based on rules,methods based on statistics and methods base on machine learning and some methods use integrated strategies to combine the advantages of the three previous methods.This paper presents a bilingual word collocation disambiguation algorithm based on the predicate collocation frequency matrix and a bilingual word collocation recognition algorithm based on the predicate compatibility matrix.According to the difference of semantic information of the collocations,we divide the collocations into three types,namely word collocation,conceptual collocation and predicate collocation.One word collocation can generate one or several conceptual collocations,and one conceptual collocation can generate one or several predicate collocations,too.In the experiments of word collocation disambiguation and identification,we make full use of the relationship among these three collocations.First,we extract Chinese and English word collocations from large scale corpus,with the help of HowNet,we can get a large conceptual collocation set by these word collocations,then use the conceptual collocation set we can construct the predicate collocation frequency matrix.The co-occurrence frequency of the word collocations is superposed in the predicate frequency collocation matrix,and all the score of the conceptual collocations generated by the word collocation can be calculated according to the information in the predicate collocation frequency matrix,the conceptual collocation with the highest score will be selected as the disambiguated result of the word collocation.Bilingual word collocation disambiguation method use both the Chinese predicate collocation frequency matrix and English predicate collocation frequency matrix when calculate the score of conceptual collocations.This method effectively increases the amount of information of predicate collocations and improves the accuracy of disambiguation.After all the word collocations are disambiguated,we can get a disambiguated word collocation set,that means,each word collocation corresponds to a conceptual collocation.With the help of the disambiguated word collocation set,we can construct a predicate compatibility matrix.Each cell of the matrix shows the compatibility of one predicate collocation.By using this matrix we can judge a conceptual collocation is a normal collocation or not.And then,we can judge a word collocation is a normal collocation or anomalous collocation with the compatibility of all its conceptual collocations.The experimental results show that both the word collocation disambiguation algorithm based on the predicate collocation matrix and the word collocation recognition algorithm based on the predicate compatibility achieve our expectation,and the feasibility and effectiveness of the proposed method are also proved.
Keywords/Search Tags:Word Collocation Disambiguation, Word Collocation Recognition, Predicate Compatibility
PDF Full Text Request
Related items