The Research Of Text Classification Technology Based On The Part Of Speech And LDA Topic Model

Posted on:2017-01-11

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhang

Full Text:PDF

GTID:2308330485964006

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the advent of the Internet Plus Age, there are more and more data need to be processed and works of text mining become more and more important. People want to gain information accurately from the vast amounts of information about the text, so for the existing textual data mining technology, people put forward a series of higher demand Text classification is one of the important techniques in text data mining, it has been widely used in information filtering, search engine digital library, personalized recommendation and other fields. The study of it has the very strong practical significance.Firstly, this paper introduced the value of the text classification technology and the background of text classification. Then it elaborated the domestic and international current research status of text representation and feature selection. It analyzed feature selection methods of traditional text classification technology and found feature selection methods of traditional have some problems like high dimensions of feature space and low efficiency of classification and low accuracy. Consider with the importance of parts of speech in the text, this paper proposes using feature selection method based on the part of speech, and use it combined with LDA topic model at the same time and deeply analyzed the significance and value of the method, and the advantages of the part of speech on LDA topic model, and the influence of the performance evaluation of the final classification results.Secondly, this paper chose some classical algorithms which are useful or they will be used in experiments of this paper. These methods are important links to text classification technology, such as preprocessing, text segmentation, feature selection, feature weight, algorithms of classification, performance evaluation and so on. This paper has introduced these methods and the overall process of text classification technology.Thirdly, in view of the proposed way of feature selection method based on the part of speech and LDA topic model, this paper focused on the part of speech of words and LDA topic model. In order to verify the availability of parts of speech, the paper studied the distribution of parts of speech by using typical feature selection algorithms and the effects of feature dimension reduction and classification results by selective screening of part of speech as a feature. The research has analyzed the importance and value of most kinds of parts of speech by various experiments. Finally combining parts of speech and the LDA topic model, it studied the significance of parts of speech in the LDA topic model. Through the experiment and the authentic data, we found that just noun, verb and adjective can represent an article. They determine the main content of the text especially nouns. These experiments verified the importance of parts of speech. Although we found that these words affect the classification results directly, but they can reduce the original data set. In the other words, this method can reduce the time and space requirements and keep the original performance. At the same time, it verified the LDA topic model reliance on parts of speech and applicability of parts of speech based on the original experiment, and the combination of part of speech and LDA theme model has very good classification effect.Finally, the paper summarizes the next research direction combining with the problems found in the experiment and prospects the text classification technology trend of development in the future.

Keywords/Search Tags:

feature selection, part of speech, LDA topic model, text classification

PDF Full Text Request

Related items

1	Research And Application Of Topic Model For Short Texts Based On Part-of-Speech Feature And Semantic Enhancement
2	Research On Feature Expansion And Classification Of Short Text Based On Topic Model And Deep Learning
3	Research On Text Classification Method Based On Part Of Speech Tagging LDA Model
4	The Research Of Text Classification Based On Feature Selection And Topic Model
5	Research And Application Of Text Classification Model Based On Topic Model
6	Study On Key Techniques Of Text Content Classification And Topic Tracking
7	Study On Feature Extraction And Text Representation Technology In Topic Tracking
8	Research On Text Classification Of Web Text Mining
9	Research On Hot Topic Classification And Heat Prediction Model Of Weibo
10	Short Text Classification Research Based On Sina Weibo