Research On Text Classification Algorithms Based On Machine Learning

Posted on:2018-11-29

Degree:Master

Type:Thesis

Country:China

Candidate:P X Deng

Full Text:PDF

GTID:2348330518496527

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

As the time of big data is coming, more and more data on the Internet has been becoming a great value. Unstructured data represented by text,can serve as datasets for all kinds of data mining tasks, such as user portrait and public opinion detection, while can help provide plentiful content,express feelings,. and share experience. Text classification, as the foundational task in the field of Nature Language Processing (NLP), not only can help automatically select information and accelerate the information process, but also can serve for complex tasks such as sentiment analysis, automatic summary, and human-computer dialogue, which provide users with intelligent and personalized service. According to the number of labeled data in train set, text classification can be divided into supervised text classification and semi-supervised text classification. And,the research of semi-supervised text classification is lack. Therefore, to further improve the accuracy of text categorization, and to solve text categorization problem under complex scenes is a hot topic of in the field of NLP.In the task of supervised text classification, sufficient labeled samples can be used to train complex models to achieve better performance.Compared with the shallow learning model, the neural network model has strong ability of feature extraction and modeling of complex problems. The dimension of features produced from traditional text representation model is not high enough to fully train the deep neural network. In contrast, word-embeddings model can help transform text into two-dimensional grid data and is suitable for convolution processing, which carry semantic and syntactic rules. Besides, convolution specializes in dealing with spatial relations, which makes it possible to extract context and structure rules automatically. Therefore, CNN greatly improved performance in supervised text classification. In addition, we proposed to employ neural networks with different structures to capture useful feature from text in different lengths, and further improved the accuracy of classification.As for semi-supervised text classification tasks, the lack of labeled data always leads to unfitting or over-fitting in supervised classification model. Co-training, based on the differentiated feature space, has achieved good results with the use of supervised classifiers. However, the way to find the two view from content to meet the conditions of full redundancy and conditional independence is the difficulty of text co-training. In this paper, two different feature spaces are constructed from different text representation models, which are based on different points and ways. As global/detail views of co-training, the particularity for scenes in the existed models are solved. On this basis, an improved co-training algorithm by employing multiple under-sampling for unbalanced dataset is also presented. The experimental results show that the proposed co-training model is superior to semi-supervised text classification.

Keywords/Search Tags:

text classification, Convolutional Neural Network, word embeddings, co-training, Semi-Supervised Learning

PDF Full Text Request

Related items

1	Research On Short Text Classification Of Semi-supervised Pre-training Based On Autoencoders And Word Order Dependencies
2	Text Classification Based On Semi-supervised Learning
3	PolSAR Image Classification Based On Fully Convolutional Network And Semi-supervised Learning
4	Research On Semi-supervised Classification Method Of Hyperspectral Images Based On Convolutional Neural Network
5	Research On Semi-supervised Short Text Classification Based On Co-operative Training
6	Research On Text Classification Algorithm Based On Graph Neural Network
7	The Research On Target Detection Of SAR Image Based On Semi-Supervised Convolutional Neural Network
8	Research On Word Sense Disambiguation Based On Semi-supervised Model
9	Research And Application Of Convolutional Neural Network In Collaborative Semi-Supervised Classification
10	Research On Text Clustering Based On Semi-supervised Learning