Font Size: a A A

Sparse Model And Its Application For Text Classification

Posted on:2015-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:Q FuFull Text:PDF
GTID:2308330485990700Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of Internet technology, the speed and the breadth of information transmission are increasing quickly. At the same time, the size of the data is becoming larger. But due to the high cost of collection, sample size is far less than the dimension of data. In particular, as an important carrier of spreading information, more and more text information need to deal with. As a result, text data processing with high dimension and small sample size is now became a very hot research field.As an important method in machine learning, sparsity method has the characteristics of processing high dimensional data with small sample size. At the same time, bring good interpretability for learning results. As a result, it has been successfully used in various fields.This thesis focuses on the application of sparsity models on text data. And makes two work summarized as follows:1. The application of sparse model in Chinese text classification. As Chinese text preprocessing method based on Chinese word segmentation may lose some valuable information for classification, we propose a framework of text classification based on character based N-gram text preprocessing method and L1-regularized logistic regression. Through the use of character based N-gram method, not only can make preprocessing step simple, but also can keep as much valuable information as possible. But the problem is that it also keeps a large amount of redundant information. In order to solve this problem, we propose to use L1-regularized logistic regression method. For the sparsity of N-gram data, L1-regularized logistic regression can effectively choose the real valuable features for classification. At the same time, it can also solve the difficult problem of optimization in large feature space. Through the experiments on text categorization corpus, we verify the effectiveness of the proposed text classification method.2. The application of sparse model in text sentiment classification. As sentiment classification based on vector space model will lose a lot of latent semantic information, we propose a sentiment classification method based on text graph representation and graph sparsity model. Firstly, we use two text graph structure to represent the different semantic information. Then, for the large scale of text structural representation containing semantic information, we propose to use graph sparsity model to select some valuable information from text graph representation. Through the experiments on text sentiment classification corpus, we verify the effectiveness of the proposed text sentiment classification method.
Keywords/Search Tags:Natural Language Processing, Machine Learning, Sparse Models, Text Classification, Sentiment Classification
PDF Full Text Request
Related items