Font Size: a A A

Research On Hot Topics Of Social Network Based On Data Mining

Posted on:2017-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:J Q WangFull Text:PDF
GTID:2308330482480637Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology and the popularity of the Internet, social network has become an indispensable part of people’s daily life, As a new social platform, social network is changing people’s communication habits. Today, with the expansion of social network users and information, how to quickly and accurately find the hot topic that users concerned about in mass information has become a hot research direction.Hot topic is found by applying statistical analysis and data mining technology, taking advantage of social networkers comments to mining of topic, and show a hot topic list for users such as Sina Weibo ultimately.Comparing with traditional Internet media, the current social network messages have some characteristics such as high dimensional, sparse, uneven distribution of the theme, network language is not standard, and the amount of information rapid exploding, so there are many problem in terms of accuracy and efficiency when traditional topic found technology directly applied in the social network. Through comparation and analysis of the advantages and disadvantages of various algorithms about mining of topic, this paper selects the Na?ve Bayesian classification algorithm and the Single-pass clustering algorithm as the topic discovery algorithm, intensively study those algorithm and improve them. The main study in the paper has following several aspects:(1) Intensive research of the characteristics of the current social network information, illustrates the basic flow of topic recovery, analysis and compare the algorithms which in the process of mining data, design and implement an automatic way to get experimental data. According to the characteristics of the current social network analysis the current problems of topic detection technology, put forward a way of classification before clustering mining hot topic.(2) Na?ve Bayesian classification algorithm are classified in terms of accuracy and speed problems because of the characteristics of the social network, this paper introduced the variance filter into Na?ve Bayesian classification algorithm to improve, combine the improved Na?ve Bayesian classification and Hadoop platform form parallel classification, so as to achieve the purpose of the improve the accuracy and speed of classification. Through accessing to public data sets by Sina API, verify the validity of the improved algorithm by comparing experiment.(3) Put forward Single-pass clustering algorithm base on block, this algorithm reduce the time complexity of the traditional Single-pass clustering algorithm by block thought, improve the efficiency of the clustering; Accessing to public data sets by Sina API, verify the advantages of the improved algorithm by comparing experiment, and verified classified before clustering more superiority than directly clustering to obtain the hot topics.
Keywords/Search Tags:social network, hot topic, Na?ve Bayesian classification, Single-pass clustering
PDF Full Text Request
Related items