Font Size: a A A

Design And Implementaion Of Finance News Classification System Based On Labeled-LDA

Posted on:2017-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y X XiongFull Text:PDF
GTID:2308330485460489Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Text classification has been the hotspot research in natural language processing. Along with the vigorous development of the Internet industry, text classification plays a key role in the field of practical application, retrieval, and natural language processing. It is the key technology for machine learning. The main techniques of text classification include preprocessing, feature extraction, feature dimension reduction, classification and so on.This article is mainly focusing on feature extraction methods and classification method. After studying the traditional feature extraction methods and topic modeling in a variety of ways, and the comparison between those two methods along with study of system requirements, Labeled latent Dirichlet Allocation model is chosen to be used as feature extraction method. After determining the model using, this article then studies the different parameter estimation algorithms in topic modeling, and chooses the appropriate method to estimate the parameters in the model.On the classification side, This article studied and compared several kinds of representative classification methods, and selected the Support Vector Machine to be used as the classification method. The article also studied and compared the multi classification strategy used in Support Vector Machine. Combined with actual situation, this article chose the One Versus Rest multi classification strategy.Feature extraction and classification methods are the key points in the research of text classification. Old ways of traditional feature extraction methods are based on vector space model, which is easy to produce the high dimension of the feature space and bring more useless features, dimension and precision of feature reduction is needed. Labeled Latent Dirichlet Allocation model, a supervised topic model, which can be used as feature extraction here. In this paper, the Labeled-LDA model is used to extract the feature value, and the support vector machine is used to perform the text classification, and the system is constructed, and the performance of the system is verified by experiments.The topic of this thesis comes from the actual project of my internship. The difficulty of the project are the comparison and selection of feature extraction technology and classification technology, the application of theme modeling for feature extraction and the design and implementation of the system. My work is divided into the following three parts. One is the study of topic model modeling method and classification method. The second is design and implementaion of the system based on Labeled-LDA. The third is several tests based on prototype system and delivering the results, to verify the feasibility of the system.
Keywords/Search Tags:Text Classification, Topic Model, Labeled Latent Dirichlet Allocation, Support Vector Machine
PDF Full Text Request
Related items