Research On The Automatic Lassification Algorithm Of Archive Text Based On Decision Tree

Posted on:2016-01-17

Degree:Master

Type:Thesis

Country:China

Candidate:S F Huang

Full Text:PDF

GTID:2298330470454086

Subject:Systems analysis and integration

Abstract/Summary:

PDF Full Text Request

In the data explosion era, how to extract data from a mass of data which we need is a big problem that we meet. The technology of data mining is to solve this problem, becomes a hot and a focus research for experts and scholars. Text classification to further narrow the scope of data mining, has become a very important research field in data mining. A good classification model and an optimal modeling method, not only can reduce the time required for text classification cost, but also improve the accuracy of text classification. So, how to quickly establish a classification model, how to reduce the time required for text classification cost and how to improve the accuracy of text classification are research foci of this paper.In this paper, on the basis of C4.5algorithm, by introducing the concept of Equivalent Infinitesimal in higher mathematics, a calculation formula of the C4.5algorithm in the complex logarithm is improved. The improved with simple four mixed computing instead of the complex logarithmic operation of C4.5algorithm, eliminates the process which computer to calculate the log need to call library function, reduces the time cost which C4.5algorithm generates a decision tree, thereby reducing the time cost of the text classification process. When the demand has changed, the original is no longer meet the needs of the decision tree, decision attribute has to change. At the same time, according to the new changes, we do not have the training data set ready. Aiming at this problem, this paper presents a method of generating a decision tree by the classification rules, which comprises the following steps. Firstly, artificially makes classification rules according to the demand and experience of human. Secondly, it generates decision tree through production rules n. Finally, it adjusts the decision tree classification using machine learning methods, so as to meet the requirements of the current.To make a long story short, this paper always regards reducing the time cost of generation decision trees and how to build a decision tree quickly under the condition of having no training data set as the study purposes. Therefore, through the optimization and improvement of the calculation formula of the C4.5algorithm and according to present a method of direct conversion from classification rules to the decision tree, the paper achieves the original intention of it, and uses example analysis and experimental results to verify the effectiveness of the improved method in it. Finally, the improved method is applied to the Yunnan cigarette factory records text data classification, obtaining a good effect.

Keywords/Search Tags:

Text classification, C4.5algorithm, Production rules, Classification rule, Algorithm optimization

PDF Full Text Request

Related items

1	Contributions To Several Key Issues Of Associative Text Classification
2	Building Of Classification Method And Classifier About Text Complaints Information Based On Association Rules
3	Associated With Technology-based Chinese Text Classification
4	Research And Application Of Text Classification Algorithm For Chinese Information
5	Researching Of Association Rules In Text Classification
6	Ant Colony Algorithm And Its Application In Text Classification Problems
7	The Study Of Chinese Text Classification Based On FOA-SVM
8	Short Text Classification Based On Apriori Algorithm
9	Classification Association Rule Induction Algorithm And Applied Research
10	Classification And Clustering Gene Expression Programming