The Research On Chinese Automatic Classification

Posted on:2014-10-22

Degree:Master

Type:Thesis

Country:China

Candidate:B Wu

Full Text:PDF

GTID:2268330428463873

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid progress of time and the rapid development of information technology,there are diversify ways to obtain information and there are plenty of ways for information representation,and with the massive growth of information,especially,the extraordinary growth of e-books and e-messages,all of this make our lives much easier to some extend.Because it is easier to get information that we need. However,this phenomenon brought us some problems though.It is harder for us to get what we want as there are so much junk information.Text classification,as an effective way to improve the speed and accuracy of text categorization,raised widespread concern.Thus,text classification is one of the hot research topics.Text categorization is the task of assigning a great number of appropriate categories to a text document.There are many categorization schemes addressed for this automatic text categorization task in text categorization literature.This include Naive Bayes probabilistic classifier,Support Vector Machine,Neural Network and k-NN classifiers.k-nearest neighbor is well known for its simple and effectiveness.And it has been widely used.This thesis,the first part,we summarize the process of Chinese Text Categorization and correlation theory firstly,introduce boolean model,probabilistic model and Vector space model three text representation model briefly,then summarizes five methods of text categorization algorithms,for instance, Rocchio、Naive Bayes、SVM、k-NN and compare their merits and drawbacks in classifier performance and the complexity of text algorithms to some extent. At last,overview common methodsof classifier performance evaluation.The second part,we make a deep analysis of the k-nearest neighbor classification algorithm,in allusion to the disadvantage of the kNN,we introduce an adaptive scheme.The new methods,according to the traditional kNN,injects the concept of push-drag strategy,we add weight vector for each category,thus to improve the similarity calculation formula,and put forward an new way to improve k-neighbor classification algorithm. The last part,we design4groups of comparison experiments,and select the more appropriate characteristic dimension and K value,and the experiments shows the perfomrmance of the adaptive k-NN is better than the traditional one.

Keywords/Search Tags:

Text classification, KNN, push and drag stragety, weight vector

PDF Full Text Request

Related items

1	Research On KNN Text Classification
2	Designed And Implementation Of Chinese Text Categorization System Based On Support Vector Machine
3	Term Weight-Based Chinese Text Classification Algorithm
4	Based On Data Distribution Characteristics Of Text Classification
5	Text Sentiment Analysis Based On Text Classification
6	The Design And Implementation Of Text Classification System Based On SVM-KNN
7	Research Of Chinese Text Classification Based On Mixed Feature
8	Research On Text Similarity Algorithm Based On Vector Space Model
9	Research On Semi-structured Text Push Technology And Application
10	Research On Feature Vector Optimization Techniques In Web Text Classification