Study On Text Categorization Based On Decision Tree And K Nearest Neighbors

Posted on:2007-08-29

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y Wang

Full Text:PDF

GTID:1119360212470835

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

Text categorization is one of the most important issues of text mining, which is thought as a basic cognitional form. The researches on the methods of feature dimensions reduction, text categorization and text categorization rule extraction have not satisfied the actual applications so far. In this paper, the text feature dimensions reduction and text categorization rule extraction using decision tree are investigated, and some new KNN algorithms are represented for text categorization.In this paper, three methods for text feature dimensions reduction are presented: The first method reduces the dimensions based on pattern aggregation theory and an improvedÏ‡~2 statistic, and then the better accuracy of categorization is acquired; the second method reduces the dimensions based on the CHI value and rough set, and then text categorization rules are extracted based on decision tree that are understood easily and have better accuracy of categorization; the third one reduces the dimensions based on the neural network theory, ranked features using a sensitivity method and selects the features using dichotomy method, then the number of the dimensions is reduced which avoids huge computation of neural network.Two methods for text categorization fuzzy rule extraction are presented based on fuzzy decision tree. The first method presents a fuzzy decision tree with merging some branches, then the number of categorization rule is reduced largely because of merging some branches; In the second method, a new method for constructing membership functions is presented, which reduces the time of data fuzzification largely, reduces the number of rules and increases categorization accuracy consequently.In this paper, three methods are presented to improve the KNN algorithm:Acquire weights of Euclid distance formula: Two methods are presented. Firstly, weights of every feature are acquired based on a sensitivity method. Then the method considers the different functions of the same feature on different classes in Euclid distance formula. The other method is based on chi-square distance theory. First, k0 approximate nearest neighbors are acquired based on SS tree. Then weights are computed based on k0 approximate nearest neighbors and chi-square distance theory. Both methods can improve the accuracy of KNN algorithm.

Keywords/Search Tags:

text categorization, decision tree, KNN algorithm, fuzzy logic, rough set theory, neural network

PDF Full Text Request

Related items

1	Study On Methods Of Data Mining And Text Mining Based On Fuzzy Logic And Neural Network
2	Research On Text Categorization Based On Classification Algorithm
3	Application Of Decision Tree And Neural Network In Stock Classification Prediction
4	Incremental Algorithms For Temporal Data Mining Based On Rough Set Theory And Decision Tree
5	Simulation Of LUCC Based On Neural Network And Decision Tree Model
6	Research On Several Kinds Of Uncertain Rough Decision
7	System Of Systems Effectiveness Simulation Analysis Based On Decision Trees
8	Application In Customer Consumption Model Base On Rough Set And Data Mining Methods
9	Fuzzy Neural Network In The Research On The Application Of Stock Forecast
10	Research On Supermarket Customer Satisfaction Based On C5.0 Algorithm And BP Network