Font Size: a A A

Study Of Chinese Text Classification Method

Posted on:2010-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:G RongFull Text:PDF
GTID:2178360275962611Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid popularity and development of Internet, the online information resources has been increased day by day. People have transit from the period which has been lack of information resources to the digital age that the information is extremely rich. Facing the abundant online information resources, it is difficult for people to find the actual needed information quickly and effectively. Therefore, how to organize and manage the online information reasonably and effectively have become a very important issue in the information dealing field.Traditionally, web page classification has relied on manual methods. Obviously, with the rapid growth of the web page information capacity, web page classification will consume a lot of manual and material resources. It is unrealistic that we still deal with the information manually.The online information is disordered and unsystematic to a greater extent.Text classification is a powerful means of the organization and management of the information.It can settle the phenomenon.It is easier for consumers to search the information needed.So the classification of the web page is essential and necessary. It becomes an important means for people to settle the online information which can combine gradually with search engine technique and information filtering technique and so on.Firstly, this paper introduces the key technologies of the chinese text classification. Text pre-processing is the one of the key factors that can influence the accuracy of text classification. To this end, we analyze the Chinese text pre-processing technique firstly in this part, including chinese word segmentation and the handling of stop words.Then,we introduces the Vector Space Model as a text expression model.At last, we give the comparison of the feature selection algorithms.Secondly, this paper studies the classification algorithm that is the core of the text classification. A new type of general machine learning method (Support Vector Machine) which developed in recent years is selected to classify. In this part, the basic principles of support vector machine is given first, including linearly separation, non-linear separation, support vector machine and kernel function which is constantly used. In addition, training algorithm of support vector machine and the problem of multi-classification of support vector machine are also mentioned.The contribution of the thesis is as follows:A improved classification algorithm is proposed which is based on Discrete Particle Swarm Optimization algorithm and Decision Tree Method.The merit of traditional DAG-SVM and DT-SVM methods is that the speed of decision-making is faster significantly than "one-to-one" and "one-to-many". However, a common drawback is that the decision tree structure is fixed as long as the number of fixed categories.It can't make the adaptive adjustment in accordance with the specific classification problems. The classification performance is often different when SVM is in different positions of the decision tree. Error accumulation is very serious when the mismatch happen in approach to the root node. However, traditional DAG-SVM and DT-SVM methods do not consider how to arrange the location of SVM.They don't consider the decision optimization problem. Therefore, a improved classification algorithm is proposed which is based on Discrete Particle Swarm Optimization algorithm and Decision Tree Method in light of the weakness of the present classification algorithm. We introduce the Particle Swarm Optimization in each node of decision tree in order to generate the optimal decision tree.The experiments show that the improved classification algorithm has enhanced the classification performance accuracy of the present algorithm.
Keywords/Search Tags:Text Classification, Support Vector Machine, Particle Swarm Optimization
PDF Full Text Request
Related items