Font Size: a A A

Reinforcing The Topic Of Embeddings With Theta Pure Dependence For Text Classification

Posted on:2017-12-27Degree:MasterType:Thesis
Country:ChinaCandidate:N XingFull Text:PDF
GTID:2348330515467337Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recently,deep learning in natural language processing attracts more people's attention.A series of neural language models and word/sentence embedding models are developed from deep learning models.These effective models are used widely in academia and industry.However,it is inapposite that the original distributional hypothesis in language models is introduced to the word/sentence embedding model for texts classification.The highly polar topical feature is extremely needed in texts classification.However,the original word/sentence embedding model for texts classification still captures the language rules only,ignoring topical information.In order to apply the word/sentence embedding model in texts classification effectively,we propose to reinforce the topic of embeddings with Theta Pure Dependence(TPD)for texts classification.For sentiment classification,it is often recognized that embedding based on distributional hypothesis is weak in capturing sentiment contrast–contrasting words may have similar local context.Based on broader context,we propose to incorporate TPD into the word/sentence embedding model to reinforce topical and sentimental information.TPD has a theoretical guarantee that the word de-pendency is pure,i.e.,the dependence pat-tern has the integral meaning whose under-lying distribution cannot be conditionally factorized(cannot be unconditionally factorized certainly).The high-order dependence cannot be reduced to the random coincidence of lower-order dependencies.Moreover,a unique and explicit topical meaning in the patterns is modeling to guarantee no ambiguity in the global context.Our model is applied in sentiment prediction and topic discovery task based standard datasets,and the results outperform the state-of-the-art performance on these text classification tasks.Besides,the bag of words model and LDA topic model are compared with our model in a Chinese news data mining project.We believe that word/sentence embedding models are able to be a principal method for texts feature presentation.
Keywords/Search Tags:Text Classification, Topic Reinforcing, Word/Sentence Embedding
PDF Full Text Request
Related items