Font Size: a A A

Word Embeddings Towards Text Classification Of Emotion And Topic

Posted on:2020-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:X H HuFull Text:PDF
GTID:2518306305497684Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Feature engineering has been the foundation of machine learning tasks and the performance of model and system will be directly affected by the quality of features.The task of natural language processing usually takes a word as a unit,so the characteristic representation of words becomes the basic work of carrying out various tasks and research.The traditional one-hot encoding cannot be widely applied to tasks of natural language processing because of its feature sparsity.Therefore,the word embeddings based on distributed hypothesis comes into being.The training methods of common word embeddings are based on the distributed hypothesis.The principle is to learn the semantics of words through large-scale corpus.The obtained word vectors can express semantics in low-dimensional space,so it has been widely used in the field of text mining.The word vectors trained by unsupervised method is semantically universal and versatile.However,the disadvantage of common word vectors is that they cannot fully conform to the specific corpus.The word vectors have a certain room for improvement in the specific issues,such as keyword extraction,sentiment classification,and topic classification.In this thesis,the word embeddings is combined with the tasks of topic discovery and text classification,and the representation of key words,emotional words and topical words have been studied.The main contents are as follows:(1)An extraction and representation method of key words combined with the topic model are proposed.This method converts words of text into vectors based on a distributed representation of words.At the same time,the topic model is used to cluster the text and the topic is represented as the probability distribution of the word to obtain the relevance of the word to the topic.Then in each topic,the network of keywords is constructed using the similarity between word vectors.Finally,the keywords of topic are obtained by calculating the core nodes in each topic network.(2)A training method of word embeddings for text sentiment classification is proposed.This method firstly uses a pre-trained word vectors to represent document features by two different linear weighting methods.Then the document vector is used as the input of the classification,and a text sentiment classifier based on neural network is trained.The emotional polarity of the text is propagated into the word vectors through the processes of gradient descent and backpropagation.Finally,the emotional word embeddings can achieve better performance in tasks of text sentiment prediction,text similarity calculation and word emotional expression.(3)A training method of word embeddings for text topic classification is proposed,which is a extension of the emotional word embeddings.First,the document features are modeled by two linear weighting methods using pre-trained word vectors.Then One vs.Rest methods are used instead of multi-category method for topic classification,in other words,training multiple text classifiers based on neural network.The theme of the text is transferred to the word vectors by the process of gradient descent and backpropagation.The final topical word embeddings is a weighted linear combination of word vectors in multiple classifiers.The experimental results show that our models outperform typical word embeddings models on text topic prediction and text similarity calculation.
Keywords/Search Tags:Natural language processing, Word embeddings, Topic model, Text classification, Emotional calculation
PDF Full Text Request
Related items