Font Size: a A A

Research Of Opinion Leaders Identifying In BBS Based On Hadoop Platform

Posted on:2016-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:C WuFull Text:PDF
GTID:2348330476455774Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Opinion leader is an authority person who has important influence in BBS. Their linguistic behavior has great impact on net citizen's behavior and thought. In recent years,more and more people pay attention to the research of opinion leader. Some following problems still exist in the current researches about opinion leaders: Without taking into account that the views of leaders will be restricted by their professional knowledge, the knowledge of person is so limited that it can't cover everything; When building network diagram of the user's response relation, the weights among users cannot be determined by the number of their interaction because it cannot show the real communication among users in the forum; Current study is mainly used in small network structure with small amount of data. The computing of big data is limited and it has low efficiency, poor interoperability and more resource consumption when faced with huge data processing in large networks.In order to solve the existing problems of identifying opinion leaders, an algorithm of identifying the opinion leader of the topic-field in the TianYa forum is introduced in the paper and the main contents are as follows:(1) Considering the limitation of the opinion leaders' knowledge, this paper presents a system architecture to identify opinion leaders based on topic-field. Firstly, the posts are clustered analysis according to their content, the existing Single-Pass incremental clustering algorithm identifies topic under the mass documents flow inefficiently. A two-stage Single-Pass algorithm based on time named TDSingle-Pass is proposed. The algorithm is optimized in terms of similarity comparison and improves the effectiveness and accuracy of clustering systems.(2) In view of the problem of the weights among users, an identification of opinion leader algorithm named UserRank based topic-field is presented in the paper. The replies relationship graph is constructed with the replication of each other. The algorithm considers tendency of emotion factors, the distance in the graph and similarity of the replied content between the users' response relation. Otherwise, the algorithm uses for reference from the thought of PageRank and redefines the transfer probability matrix and overcome the inefficiency in the single-machine environment.(3) The last part of this paper experiments verify the TDSingle-Pass algorithm and UserRank algorithm in the Hadoop environment. Firstly, the experiment determines the threshold value of the similarity when comparing two post in the TDSingle-Pass algorithm. Secondly, making comparisons with false positives, non-response rates and error price three indicators of Single-Pass existing in standalone environment, the three indexes are decreased 9.04%, 12.86% and 9.04% respectively. The UserRank algorithm can find the top 12% of opinion leaders quickly and accurately through comparing the UserRank algorithm and other existing PageRank algorithms not running in Hadoop platform.
Keywords/Search Tags:Opinion Leader, Cluster Algorithm, Hadoop, BBS
PDF Full Text Request
Related items