| With the advent of the Internet Plus Age, there are more and more data need to be processed and works of text mining become more and more important. People want to gain information accurately from the vast amounts of information about the text, so for the existing textual data mining technology, people put forward a series of higher demand Text classification is one of the important techniques in text data mining, it has been widely used in information filtering, search engine digital library, personalized recommendation and other fields. The study of it has the very strong practical significance.Firstly, this paper introduced the value of the text classification technology and the background of text classification. Then it elaborated the domestic and international current research status of text representation and feature selection. It analyzed feature selection methods of traditional text classification technology and found feature selection methods of traditional have some problems like high dimensions of feature space and low efficiency of classification and low accuracy. Consider with the importance of parts of speech in the text, this paper proposes using feature selection method based on the part of speech, and use it combined with LDA topic model at the same time and deeply analyzed the significance and value of the method, and the advantages of the part of speech on LDA topic model, and the influence of the performance evaluation of the final classification results.Secondly, this paper chose some classical algorithms which are useful or they will be used in experiments of this paper. These methods are important links to text classification technology, such as preprocessing, text segmentation, feature selection, feature weight, algorithms of classification, performance evaluation and so on. This paper has introduced these methods and the overall process of text classification technology.Thirdly, in view of the proposed way of feature selection method based on the part of speech and LDA topic model, this paper focused on the part of speech of words and LDA topic model. In order to verify the availability of parts of speech, the paper studied the distribution of parts of speech by using typical feature selection algorithms and the effects of feature dimension reduction and classification results by selective screening of part of speech as a feature. The research has analyzed the importance and value of most kinds of parts of speech by various experiments. Finally combining parts of speech and the LDA topic model, it studied the significance of parts of speech in the LDA topic model. Through the experiment and the authentic data, we found that just noun, verb and adjective can represent an article. They determine the main content of the text especially nouns. These experiments verified the importance of parts of speech. Although we found that these words affect the classification results directly, but they can reduce the original data set. In the other words, this method can reduce the time and space requirements and keep the original performance. At the same time, it verified the LDA topic model reliance on parts of speech and applicability of parts of speech based on the original experiment, and the combination of part of speech and LDA theme model has very good classification effect.Finally, the paper summarizes the next research direction combining with the problems found in the experiment and prospects the text classification technology trend of development in the future. |