Font Size: a A A

The Research On Chinese Automatic Classification

Posted on:2014-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:B WuFull Text:PDF
GTID:2268330428463873Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid progress of time and the rapid development of information technology,there are diversify ways to obtain information and there are plenty of ways for information representation,and with the massive growth of information,especially,the extraordinary growth of e-books and e-messages,all of this make our lives much easier to some extend.Because it is easier to get information that we need. However,this phenomenon brought us some problems though.It is harder for us to get what we want as there are so much junk information.Text classification,as an effective way to improve the speed and accuracy of text categorization,raised widespread concern.Thus,text classification is one of the hot research topics.Text categorization is the task of assigning a great number of appropriate categories to a text document.There are many categorization schemes addressed for this automatic text categorization task in text categorization literature.This include Naive Bayes probabilistic classifier,Support Vector Machine,Neural Network and k-NN classifiers.k-nearest neighbor is well known for its simple and effectiveness.And it has been widely used.This thesis,the first part,we summarize the process of Chinese Text Categorization and correlation theory firstly,introduce boolean model,probabilistic model and Vector space model three text representation model briefly,then summarizes five methods of text categorization algorithms,for instance, Rocchio、Naive Bayes、SVM、k-NN and compare their merits and drawbacks in classifier performance and the complexity of text algorithms to some extent. At last,overview common methodsof classifier performance evaluation.The second part,we make a deep analysis of the k-nearest neighbor classification algorithm,in allusion to the disadvantage of the kNN,we introduce an adaptive scheme.The new methods,according to the traditional kNN,injects the concept of push-drag strategy,we add weight vector for each category,thus to improve the similarity calculation formula,and put forward an new way to improve k-neighbor classification algorithm. The last part,we design4groups of comparison experiments,and select the more appropriate characteristic dimension and K value,and the experiments shows the perfomrmance of the adaptive k-NN is better than the traditional one.
Keywords/Search Tags:Text classification, KNN, push and drag stragety, weight vector
PDF Full Text Request
Related items