Font Size: a A A

Research And Application Of Topic Clustering And Trend Analysis Based On Social Data

Posted on:2021-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:D Z MuFull Text:PDF
GTID:2518306104994639Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of the Internet and mobile Internet,social networks have quickly become an important platform for people to socialize on the Internet due to their fast information spread and wide spread.It is of great significance to study and analyze the trends of topics in social data.This paper constructs a crawler system to crawl a large amount of social data from Weibo platform and conduct a preliminary analysis on it.An improved text representation model is proposed and a hierarchical clustering algorithm that combines NMF and time window clusters is used to cluster texts to form topics.Finally,it analyzes the characteristics that affect the topic trend and conducts comparative research and improvement on the trend prediction model.First,a crawler system was built to crawl social data from the Weibo platform,and basic analysis was performed on the crawled Weibo data,including blog post analysis,blog owner analysis,user network analysis,and topic network analysis.Secondly,for the existing text representation models,they often only consider the word frequency features,ignoring the correlation between words and documents,and ignoring the importance of the same word in documents of different lengths.This paper proposes to improve the text representation of TF?IDF and PMI.model.In view of the sparse feature of short text,this paper introduces L2 regularization factor to improve the non-negative matrix factorization algorithm.Then,combine hierarchical clustering with non-negative matrix decomposition,and use the aggregation evaluation method of normalized impairment loss to form a stepwise non-negative matrix decomposition algorithm under the hierarchical clustering framework and cluster the text set.The class results are divided by time window,and the similarity calculation method combining cosine similarity and Jaccard similarity is used to merge the clusters of adjacent time windows to achieve the effect of dynamically detecting merged topics.Finally,analyze and predict topic trends,analyze the forwarding value of Weibo from different time granularities,and establish a trend prediction index based on forwarding volume.Analyze the characteristics that affect the topic trend,propose an improved calculation method of opinion leader influence based on the traditional KED algorithm,and compare and analyze the characteristics that affect the topic trend.A variety of prediction models based on machine learning are used to predict topic trends,and an improved decision tree algorithm and a random forest algorithm are proposed.By comparison,the advantages of the improved algorithm are verified.
Keywords/Search Tags:Machine learning, topic clustering, trend prediction, NMF
PDF Full Text Request
Related items