Research Of Topic Detection For Social Media Based On Word Embedding Model

Posted on:2017-05-31

Degree:Master

Type:Thesis

Country:China

Candidate:J Li

Full Text:PDF

GTID:2348330503981839

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The twenty-first century is an era with rapid development of network and information technology. In recent years, with the popularity of mobile Internet and web 2.0 application, it rises lots of social medias such as micro-blog, blog, forum and so on, which make the common people to express their views more and more convenient on the web. Large numbers of online comments can reflect attitudes, opinions and requirements of the public for a period of time, timely and accurately grasp, deeply mining and analysis what the Internet users are discussing is extremely important. However, most of the current topic recognition work about social media are based on the attributes of data, they regard word as the basic feature, and then calculate probability of words according to word frequency, the semantic information are usually ignored. In this paper, we conduct our research on the social media dataset, and focus on the topic detection and analysis of its content by using topic models, the main work includes the following two parts:(1) In terms of the characteristics of social media datasets, the existing word vector models didn't consider the internal order relation of words, and they only used the local context to forcast the target word in each processing of training, which is insufficient to capture semantic knowledge. To overcome this problem, we propose a novel hybrid model called mixed word embedding(MWE), which considers both word order and mixed context information. This model is based on the well-known word2 vec toolbox, it combines the two variants of word2 vec, i.e., SKIP-GRAM and CBOW, in a seamless way via sharing a common encoding structure, which is able to capture the syntax and semsntic information of words more accurately; furthermore it incorporates the local and global context of the target word within a sliding window, while maintaining words order in each document, after training, we can get useful word embeddings with rich syntax and semantic information at the same time.(2) The existed probabilistic topic models regarded word as the basic unit and computed probability between words and topics by co-occurrence frequency, the semantic information was less considered, while social media usually contains large numbers of short text message, less useful word features and much noise data, which made it difficult to recognize and analyze topic directly in the specific social media topic detection tasks. In this article, we import an external expansion corpus as auxiliary information to the LDA model for better understanding words and their semantic expressions, meanwhile use the model which is proposed in(1) to get good word embeddings, then fed them into topic models for topic detection and analysis by redefining the probability conditional distribution of topic vectors and word embeddings. We minimize the KL divergence of the new topic-word distribution function and original's of LDA, to learn both word embeddings and topic model. The experimental results proved that this method performed better on word reprsentation and topic detection when compared with word2 vec and LDA model.

Keywords/Search Tags:

Social media, topic detection, feature expression, word embeddings, topic model

PDF Full Text Request

Related items

1	Research Of Joint Topic Sentiment Analysis Based On Word Embeddings Probability Model
2	Research On Topic Detection Method Of Complex Short Text Based On Topic Model
3	Word Embeddings Towards Text Classification Of Emotion And Topic
4	Research On Keyissues On Topic Detection And Topic Diffusionin Social Media
5	Topic Modeling Research Based On Word Embedding And Generative Neural Networks
6	Topic Modeling For Short Texts With Auxiliary Word Embeddings
7	Hot Topic Detection And Topic Evolution Analysis In Social Media
8	Research On Microblog Topic Detection And Tracking Based On BTM Model
9	Research On BBS Topic Detection And Tracking
10	Research On Text Topic Modeling Based On Word Embedding