A Research On Deep Learning Based Topic Modeling

Posted on:2018-02-03

Degree:Master

Type:Thesis

Country:China

Candidate:J H Zhu

Full Text:PDF

GTID:2348330515989692

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Topic modeling is an effective way of semantic information extraction and semantic feature representation.With topic modeling,we can not only reveal the latent topics from the ubiquitous texts,but also express those texts into the feature space of such topics.This kind of representation is therefore conductive to text classification and clustering,emergency detection,topic evolutionary analysis,recommendation system,and so on.However,due to the shallow architecture and the probabilistic generation,traditional probabilistic topic models still face the challenges of limited model scalability,poor topic coherence,inconsistent model inference,and inexpressive feature representation.The continuous development of deep learning brings new opportunities for natural language processing,and in the meantime,it also expands the way of topic modeling.Nowadays,deep learning models,such as word embedding,knowledge base embedding,deep neutral networks,have achieved breakthroughs in text semantic representation.This makes it possible for constructing a semantic coherent topic model with a deep architecture.However,topic modeling with deep learning is still in an early stage,how to seamlessly integrate them into a unified model is a burning issue.In this thesis,we incorporate deep learning into the traditional topic modeling,aiming at constructing a topic model with deep semantic feature representation.The whole work can be divided into three parts:Firstly,we propose a word embedding model SG_TransE(Skip-Gram with TransE)with the constraints of knowledge base.Since SG_TransE combines Skip-Gram with TransE,its embeddings can preferably convey knowledge semantics.Secondly,based on the deep semantic reinforcement from deep learning,we propose a probabilistic topic model DGPU-LDA(Double Generalized Polya Urn with LDA).In DGPU-LDA,we design a document-wise semantic encoder DS-Bi-LSTM(Document Semantic Bi-directional LSTM)to embed the semantics of each document.Then document-topic GPU semantic reinforcement,word-word GPU semantic reinforcement,together with LSTM iterative dependency modeling,can be exploited to capture the Gibbs sampling process in model inference.Finally,we refactor DGPU-LDA with deep neural networks and propose a neural topic model NS-LDA(Neural Semantic LDA).NS-LDA also employs the DS-Bi-LSTM document-wise semantic encoder to represent the documents.NS-LDA regards topics as hidden units,thus uses two hidden layers to encode the document-topic and topic-word information.Its output is the product of these two encoding parts,and this is accordant to the score of a word conditioned in a specific document.Experimental results on SogouCA dataset and 20 Newsgroup dataset demonstrate that the proposed DGPU-LDA and NS-LDA outperform some of the state-of-the-art topic models in topic semantic coherence and text classification.Meanwhile,these remarkable improvements also indicate the effectiveness of our deep topic models in text semantic feature representation.

Keywords/Search Tags:

neural topic model, deep learning, semantic reinforcement, word embedding, bi-directional LSTM

PDF Full Text Request

Related items

1	Improved Text Topic Representation And Learning Method
2	Research On Topic Classification For Texts Based On Deep Learning
3	Research On Visual Semantic Embedding Based On Deep Learning And Word Embedding
4	Research On App Classification Based On Word Embedding And Topic Model
5	Research On Text Topic Modeling Based On Word Embedding
6	Mass Mining Of Short Texts And Their Visualization
7	Research On Semantic Reinforcement Based On Topic And Word Features For RNN Language Model
8	Content-based Question Semantic Retrieval System
9	Research On Chinese Zero Pronoun Resolution Based On Word Embedding And LSTM
10	Research On Cross-lingual Word Embedding Construction Methods Based On Deep Semantics