Font Size: a A A

Research On Discovery Of Derivative Topics And The Trend Of Derived Topics

Posted on:2022-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y ZhouFull Text:PDF
GTID:2518306551982359Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of online social media such as Twitter and Microblog,short text has become a common form of information on the Internet.Due to the convenience,flexibility,and publicity of social media,public opinion spreads rapidly on the Internet and the public receives information quickly.When a topic of public opinion appears,with the rapid spread of information in social networks,a topic will derive one or more new topics in the process of information dissemination.For this phenomenon that appears in social media,we put forward the concept of derivative topics to describe the changing trends of topics in the process of information dissemination,and use the evolution of topics to discover public opinion and its derivative trends.In response to the above problems,this paper first proposes the concept of derivative topics to describe our findings,and combines paragraph vector representation and topic models to model short texts to discover the derivative relationships between topics.In this paper,we study the derivation of topics from two aspects of text content and the interaction between users.We improve the non-negative matrix factorization algorithm(NMF)to discuss topic derivation trends.The main work of this paper is as follows:(1)Propose the concept of derivative topics.Derivative topics describe the phenomenon of topic shifts in the process of topic communication,and the original topic may derive one or more topics.We define it as three types of derivative relationships: the same topic,similar derivative topic,and completely derivative topic.(2)Combine paragraph vector representation and topic model to model short texts.The emergence of derivative topic is produced over time,so this paper analyzes the relationship between topics according to the time period.In this paper,short texts are grouped according to a certain time period,and a flexible time slice method is used to ensure that the number of posts in each time period is relatively consistent.At the same time,this paper combines paragraph vector representation and topic model to solve the problem of sparse short text information.From the definition of derivative topics,there are usually the same topic words among derivative topics.First,use paragraph2 vec to learn the paragraph vector representation of short texts,and perform topic clustering on short texts.Then use the LDA model to model the generated pseudo-documents and construct the topic-word probability table.(3)Combining trends derived from research topics on user interaction behavior.User reposting and mentioning behaviors on social media are often about the same topics,but posts with the same topic may also have different text content.Therefore,if you only consider the topic of text content,some implicit topic relationships may be ignored.Therefore,this paper considers the interactive action relationship between users on the basis of discovering the topic of text content,that is,the "mention@",forwarding and replying behaviors between users,to discuss the trend of the topic.This paper improves the non-negative matrix factorization algorithm(NMF).The relationship between posts is composed of short text topics,interaction behaviors between users(mention @,reply and forwarding)factors,and the post-post relationship matrix is decomposed into topics-posts and topic-word matrices,discover topic relationships between short texts.(4)This paper successfully solves the sparsity of short texts by improving the original method,and discusses the evolution trend of derivative topics.This paper uses the Microblog data set with time information and the Twitter data set with user interaction information to verify the proposed method.Through experiments,it was compared with topic model methods such as LDA,BTM,NMF,etc.,and achieved good results and solved the sparseness of short text data.
Keywords/Search Tags:Derivative topics, Topic discovery, Social media, Time series, User interaction
PDF Full Text Request
Related items