Font Size: a A A

Research On Hybrid Embedding In Text Classification

Posted on:2021-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:J J LaiFull Text:PDF
GTID:2428330626961131Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the advent of word2 vec algorithm,Embedding technology has been well applied in text classification,but Embedding also has its shortcomings.In the traditional Embedding algorithm,a word can be expressed by an embedded word vector,but this word vector cannot cover all the meaning of the word.This is due to the nature of word2 vec algorithm training.Therefore,in order to overcome and modify to a certain extent or increase the word vector information of Embedding,this paper proposes a hybrid Embedding method.This paper uses three different corpora pre-trained Embedding to study mixed Embedding.Different corpora have overcome the problem of single representation of word vectors to a certain extent.To a certain extent,they can capture multiple meanings of word vectors.There are two main operating methods for hybrid Embedding.One is to stack different pre-trained expected word vectors,and increase the effective information of Embedding by expanding the dimension of the word vector.The second method is to use a linear fusion method of pre-trained word vectors,and to increase the effective information of Embedding to some extent through the fusion of the amount of word vectors.The experiments in this article mainly introduce the way of word vector stacking.By stacking pre-trained Embedding in pairs,under the same neural network architecture,mixed Embedding can effectively improve the model's evaluation index F1-score and AUC.
Keywords/Search Tags:text classification, word2vec, hybrid Embedding, neural network
PDF Full Text Request
Related items