Font Size: a A A

Research On The Evaluation Method Of Blog’s Subject Influence And User’s Subject Influence

Posted on:2016-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:P F ZhouFull Text:PDF
GTID:2308330479493921Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the continuous development of Microblog, the huge user group and the massive blog information have brought serious problem of information overload, and a lot of valuable information is hidden in the flood of information. To effectively manage and use the information in Microblog, it is a wise choice to pick the highly influential blogs and users out first in a particular subject. How to understand the subject influence of a blog or a user? What indicators should be considered to measure the subject influence? Intuitive experience has become unable to meet the real demand in life, and building an effective evaluation model has become an urgent problem that needs to be solved.This paper firstly clusters the blogs to generate clusters on different subjects, for this we introduce the commonly used text clustering algorithms in data processing. Focusing on the principle of Clustering Using Representatives(CURE) algorithm and its shortages, we analyze the representative point selection method of CURE, and an improved representative point selection algorithm which both considering the density and scatter of the representative point is proposed. At the same time, according to the particularity of the blog text which is brief and noisy, an improvement in the calculation of text distance which combines the Vector Space Model(VSM) model based on Term Frequency_Inverse Document Frequency(TF_IDF) and the Latent Dirichlet Allocation(LDA) model based on Jensen-Shannon(JS) distance is proposed. Based on these two improvements the CURE based on Density and Scatter(DSCURE) algorithm is proposed, and the effectiveness of the algorithm is verified by comparative experiments finally.Secondly, this paper analyzes the characteristics of information dissemination on the microblog, and puts forward a blog’s subject influence evaluation model. This model takes the relevance to the subject, the quality of the content and the timeliness of the post into account. Among these three factors, for the measurement of the content’s quality, according to the quality hypothesis of Page Rank, we think that a blog is of high quality when it has good feedback, so we measure the quality of the content by considering the forwarding grade, the quality of the users who have commented or forwarded the blog in the feedback. For the measurement of the blog’s timeliness, based on the Rayleigh Distribution we propose an active degree model which can dynamically adjuct the parameters to describe a blog’s active trend which increases first and then decreases. Finally the experiment proved the rationality and effectiveness of this influence model.Thirdly, on the basis of these studies and Leader Rank, a user’s subject influence evaluation model – Quality Rank is proposed, which considering the characteristics of the user’s personal attributes, the user’s blog features and the network structure. Then we realize the Quality Rank algorithm with MATLAB software and compare the sorting result with other influence sorting algorithms as well as analyze the result of Quality Rank at different times. Finally we arrive at the conclusion that this algorithm can evaluate a user’s influence of a subject effectively, and the sorting result is rather reasonable.
Keywords/Search Tags:Microblog, Text Clustering, Blog’s Subject Influence Evaluation Model, User’s Subject Influence Evaluation Model, QualityRank
PDF Full Text Request
Related items