Font Size: a A A

Affinity Propagation Clustering Algorithm In Words Of Chinese Studies

Posted on:2012-07-31Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2218330368980885Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The study of word clustering is very important in linguistics and natural language processing. Its application is essential in many fields such as machine translation, text classification, and intellectual search. Word clustering is the classification of words in some fields. So words in the same class are similar, and the words in different classes vary as possible. There are many clustering methods now. Each has its own pros and cons. But they have a common point. That is if there is a suitable data set then the clustering result will be better.How to calculate the similarity of words is the problem in machine clustering that based on context. Now many researches calculate the similarity of words based on simple reciprocal information. Because it is simple and easy to carry out. But this method just thinks of the frequency of words. It does not abstract the word itself, and does not consider the semantic of words. So affect the accuracy of the relation of words.This thesis studies the similarity of words and the problem of clustering separately.First, this thesis puts forward a calculation method for similarity of words based on context. This method mines the meaning of context using semantic dictionary. The method convert traditional vector with the current context to sememe vector with the current context using the semantic dictionary to mining the semantic context.The purpose is to abstract the words itself further more. Then calculate the similarity of words with traditional reciprocal information together. This method conquers the problem of sparseness of data, and the test shows it is effective.Second, In order to advance the clustering methods, and improve the result of word clustering, this thesis tries to introduce affinity propagation into clustering of Chinese words. The test shows this method for words clustering is better.
Keywords/Search Tags:word clustering, similarity, sememe, AP algorithm
PDF Full Text Request
Related items