Font Size: a A A

Research On Forwarding Prediction And Hotspot Discovery Algorithms Of Weibo Based On Big Data

Posted on:2020-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z XieFull Text:PDF
GTID:2428330602951427Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In China,where the Internet is growing rapidly,most network users are gradually using social network platforms such as Weibo,QQ and We Chat Friendship Circle to communicate.Social networks have changed the way people interact in the past.A lot of data is generated all the time in social networks,and only Weibo generates more than 150 million pieces of data every day.In the context of such a large amount of data,a single processing capacity is unable to satisfy the information processing efficiency,and therefore the big data technology appears,traditional data mining algorithms also need to be improved and parallelized.This paper studies the large data of social network based on Hadoop.Firstly,we crawl the Weibo data through crawlers.Then we propose corresponding algorithms based on the analysis of the forwarding behavior of Weibo users,and predict the user's forwarding behavior.At the same time,we use improved K-means algorithm to cluster the data,so as to discover hot topics in time.The effectiveness of the proposed algorithm is demonstrated by experiments.The main contents of this paper include the following aspects:(1)A good performance Weibo data crawling platform is designed,focusing on the analysis of Weibo login verification and anti-crawler system.Through a series of ingenious designs,multi-threading and priority queue are adopted to crawl user and Weibo text data.The main content to crawl includes user information,user relationship information,Weibo text information and Weibo forwarding information.Data is stored in a My SQL database structured way.(2)The user forwarding behavior prediction algorithm on Weibo is designed and implemented.In order to avoid the problem of low efficiency and high time cost of the existing K-nearest neighbor algorithm in the context of big data,combined with the condensed nearest neighbor algorithm,the decision boundary-based condensed Knearest neighbor algorithm(CKNN)is proposed.The design and implementation of the improved algorithm on the big data platform are analyzed.At the same time,the Knearest neighbor and condensed nearest neighbor algorithm are trained as the comparison algorithm on the dataset,and the performance of the CKNN algorithm is verified.(3)Design and implement a new Weibo hot topic discovery algorithm,focusing on the analysis of the shortcomings of the existing K-means algorithm.The particle swarm optimization algorithm is used to improve the traditional K-means algorithm and the Kmeans-PSO algorithm is proposed,so as to avoid the impact of the initial clustering center,but also will reduce local optimum conditions,then the design and implementation of the improved algorithm on the big data platform are analyzed.Finally,the proposed algorithm is compared with the DBSCAN and K-means algorithm on the dataset to verify the performance.(4)The experimental results of the algorithm based on Hadoop platform are analyzed,and the data shows that the big data platform can effectively improve the execution speed.The big data-based Weibo forwarding prediction algorithm and hot topic discovery algorithm proposed in this paper have theoretical reference value for user behavior and network public opinion research.In practice,they have exploration significance for user behavior prediction algorithm and social network data mining.
Keywords/Search Tags:social network, big data, data mining, Hadoop, user forwarding behavior, hot topic discovery, k-nearest neighbor, K-means
PDF Full Text Request
Related items