Font Size: a A A

The Research And Implementation Of Text Classification Based On Meta-Information And Optimization

Posted on:2011-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z H JiangFull Text:PDF
GTID:2178330338989854Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of information technology, the Internet has become a more populate platform for people to express their idea. Because of its quick exchange and rapid communication, the network media have a greater function of reflecting and leading the public opinion. Therefore, mastering public opinion trends, and actively guiding public opinion are urgently waits to be resolved. In order to promptly grasp the trend of public opinion changes through the magnanimous network text data is bound to the accurate highly effective analysis reorganization of texts that contains different information. The technique of automatic text categorization is one of the key technologies to solve such problems.According to the problem of different kinds of web data, this thesis brings forward and optimizes the methods of text classification based on meta-information to improve the performance of classification of web data. This thesis brings forward some new methods of text classification to provide a series of research methods and theories for public opinion monitoring. The main contents are as follows.(1) First, this thesis summarizes the current research problem about text classification and some challenge about text classification.(2) According to the problem of different kinds of web data, this thesis brings forward the methods of meta-information text classification for web text data, such as news, blog, BBS, micro blog and so on. This method includes two steps. First, it applies meta-Information algorithm for text classification. Then, it applies LDA methods which consider meta-information in LDA topic model and change the weight of topic-word for text classification if the first step fails. The methods of weight calculation are based on importance of meta-information glossary.(3) In order to solve the problem of large-scale web data classification, this thesis optimizes the text classification algorithm based on meta-information. For the second step of the text classification, this thesis applies the methods of information gain to filter words which reduce the number of words and accelerate the speed of modeling and text classification. Also, this thesis extends the text classification to distributed text classification based on messaging middleware.(4) Using the above research results, this thesis designed and implemented a archetypal system of network text categorization for YHPODS based on UIMA AS for the follow-up developments.
Keywords/Search Tags:Public Opinion Detective, Text Classification, Meta-Information, LDA(Latent Dirichlet Allocation), Information Gain, Massive Text, Distributed Text Classification
PDF Full Text Request
Related items