Research Of Text Categorization Based On The Theme Mining And Covering Algorithm

Posted on:2012-07-26

Degree:Master

Type:Thesis

Country:China

Candidate:C Liu

Full Text:PDF

GTID:2218330338470875

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The text contains a lot of contributing information on the classification. The difficulty and key is that whether use these information to select a reasonable text representation. The use of VSM represents text lack the information of semantic due to the VSM is based on statistics. Based on the word co-occurrence principles of research, we extract the word co-occurrence information of text, and group the words using word co-occurrence into phrases. These phrases are given to the high weight and merge with raw corpus so that we get the vector space model WCBVSM based on the word co-occurrence improvement, be able to rap out topic information of text better.The constructive learning method put forward by the Professor Zhang Ling and Zhang Bo Academy. The method uses spherical mapping to transform neurons into the limited space classifier. It is this method that transforms the long-term unsolved learning problem into the covering to solve, which greatly reduced the complexity of the problem description. This paper put forward the simulated annealing cross-covering algorithm (SACCA) which combines the cross-covering algorithm with simulated annealing theory. This algorithm improves the randomness when the traditional cross coverage select the initial covering point, and increase the generalization of covering by increasing the sample point contained in each coverage to reduce the number of coverage. Experimental results show that the proposed new text classification model improves the classification accuracy on the bases of accelerates recognition speed.Throughout the text, the main tasks are as follows:1. Introduce the research background and significance of text classification and covering algorithm. Mainly introduce some relevant techniques of text classification and some classical algorithms of classification briefly.2. Put forward the text Theme Mining method based on word co-occurrence by researching the word co-occurrence model.3. Research the current better classification algorithm-covering algorithm, illustrate its basic idea, structure algorithm and analyze the characteristics of covering algorithm theoretically. According to the contradiction of recognition rate with generalization ability when class and recognize the uncertain mass information, the paper put forward the cross-covering algorithm based on simulated annealing (SACCA) and achieved good experimental results.4. Put forward a new text classification model combine Theme Mining with simulated annealing cross-covering algorithm, and give the structure diagram of system design and the key technologies and methods of system realization, and test the system.

Keywords/Search Tags:

text categorization, term co-occurrence model, Simulated Annealing Algorithm, Cover Algorithm, feature selection

PDF Full Text Request

Related items

1	Research And Application On Feature Selection Algorithms Based On Term Distributions In Text Categorization
2	The Research Of Text Representation And Feature Selection In Text Categorization
3	The Research And Implementation Of Text Categorization Technology In Integrated Risk Meta-Search Engine
4	The Method Of Text Categorization Scheme Selection And Development Of A Prototype System
5	Study For Text Categorization Based On Feature Weighting
6	Research On High-Performance Feature Selection And Text Categorization
7	Research On Feature Selection Algorithm Based On Segmented Term Frequency In Text Classification
8	Theoretical Analysis And Algorithm Study On Feature Selection For Text Categorization
9	Research On Text Categorization Algorithm Based On Category Homogenization
10	Text Representation Model And Feature Selection Algorithm