Font Size: a A A

Research Of Text Categorization Based On The Theme Mining And Covering Algorithm

Posted on:2012-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:C LiuFull Text:PDF
GTID:2218330338470875Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The text contains a lot of contributing information on the classification. The difficulty and key is that whether use these information to select a reasonable text representation. The use of VSM represents text lack the information of semantic due to the VSM is based on statistics. Based on the word co-occurrence principles of research, we extract the word co-occurrence information of text, and group the words using word co-occurrence into phrases. These phrases are given to the high weight and merge with raw corpus so that we get the vector space model WCBVSM based on the word co-occurrence improvement, be able to rap out topic information of text better.The constructive learning method put forward by the Professor Zhang Ling and Zhang Bo Academy. The method uses spherical mapping to transform neurons into the limited space classifier. It is this method that transforms the long-term unsolved learning problem into the covering to solve, which greatly reduced the complexity of the problem description. This paper put forward the simulated annealing cross-covering algorithm (SACCA) which combines the cross-covering algorithm with simulated annealing theory. This algorithm improves the randomness when the traditional cross coverage select the initial covering point, and increase the generalization of covering by increasing the sample point contained in each coverage to reduce the number of coverage. Experimental results show that the proposed new text classification model improves the classification accuracy on the bases of accelerates recognition speed.Throughout the text, the main tasks are as follows:1. Introduce the research background and significance of text classification and covering algorithm. Mainly introduce some relevant techniques of text classification and some classical algorithms of classification briefly.2. Put forward the text Theme Mining method based on word co-occurrence by researching the word co-occurrence model.3. Research the current better classification algorithm-covering algorithm, illustrate its basic idea, structure algorithm and analyze the characteristics of covering algorithm theoretically. According to the contradiction of recognition rate with generalization ability when class and recognize the uncertain mass information, the paper put forward the cross-covering algorithm based on simulated annealing (SACCA) and achieved good experimental results.4. Put forward a new text classification model combine Theme Mining with simulated annealing cross-covering algorithm, and give the structure diagram of system design and the key technologies and methods of system realization, and test the system.
Keywords/Search Tags:text categorization, term co-occurrence model, Simulated Annealing Algorithm, Cover Algorithm, feature selection
PDF Full Text Request
Related items