Font Size: a A A

Key Technologies Research On Influence Analysis For Public Opinion In Microblogs

Posted on:2014-03-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Y DingFull Text:PDF
GTID:1228330422474304Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The past few years have already witnessed the rapid development and popularizationof microblogs. Due to its openness, terminal expansion, content simplicity, low thresholdand so on, microblogs deeply affects our human daily life by providing an important plat-form for people to publish comments, transform information and acquire knowledge, toname just a few. Though bearing such advantages, microblogs may cause serious impactson the national security and social development if it is out of control. For example, thedirection of the public opinion could be manipulated by opinion leaders who rely on theirown unique charm and network location, leading to a serious threat to the social stability.Therefore, theresearchonmicroblogsisquitevaluablefromboththeoreticalandpracticalperspective, especially in this age of the Internet.As can be seen, microblogs can be treated as a generalization and extension of hu-man life in the virtual network world. However, different from traditional informationnetworks, microblogs have their unique characteristics, including noisy data diversity,social media, multi-relations, the rapid spread and evolutionary, nonlinearity, large scal-ability and ect. Such differences bring forth great challenges in analyzing and miningthe microblogs. The work in this dissertation mainly focuses on the noise data diversity,social media, multi-relations and the rapid spread, which are four essential characteristicsof microblogs. In detail, we highlight the main contributions of this thesis as follows.1. For the problem of noise data diversity in microblogs, we studied spammers detec-tion with a bidirectional propagation algorithm based on statistical features. Theexisting work mainly focused on spammers detection in microblogs based on ex-plicit features, such as the interval of tweets, the ratio of mentions in tweets, theratio of URLs in tweets, and so on. In this work, we developed the DirTriangleCalgorithm, which counted local triangles, to detect the implicit spammers based onthe directed network of following. Furthermore, the AttriBiVote algorithm whichclassified users by the bidirectional propagation of the trust and statistical featuresof neighbors’ users was proposed. Comprehensive experiments were conducted ona real dataset from Twitter. As the experimental results indicated, the proposed al-gorithm was more effective than the other ones of statistical features. In addition, about83.7%dead accounts in implicit spammers were discovered by the proposedDirTriangleC algorithm.2. For the problem of social media in microblogs, we studied an influence strengthmeasurement via time-aware probabilistic generative model by taking the time in-terval, relationship of following, and the post content into consideration. In ourmodel,themixturedistributionovertopicswasinfluencedbybothwordco-occurrencesand the document’s time stamp. Moreover, the relationship was controlled by aBernoulli distribution. In particular, the Gibbs sampling was employed to performapproximate inference, and the interval of time and the multi-path influence propa-gationwas incorporated toestimate theindirect influencestrength moremicroscop-ically according to the propagation of words. Comprehensive experiments wereconducted on a real data set from Twitter to evaluate the performance of our pro-posed approach. As indicated, the experimental results validated the effectivenessof our approach. Furthermore, we also observed that the influence strength rank-ing by our model was less correlative with the method which ranked the influencestrength according to the number of common friends.3. For the problem of multi-relations in microblogs, we studied a method to mine thetopical influencers. In detail, the influence of users was measured by random walksof multi-relational data in microblogs: repost, reply, copy, read. As the uncertaintyof copy and read, a new method was proposed to determine transition probabili-ties of uncertain relational networks. Moreover, the combined random walk wasproposed for multi-relational influence network, considering both of the transitionprobabilities between the intra and inter of the network. Experiments were con-ducted on a real dataset from Twitter, and results showed that the method in thispaper was more effective than TwitterRank and the other methods of discoveringinfluencers.4. For the problem of the rapid spread in microblogs, we studied a method which mea-sures the spreadability of users. Specifically, a novel method called SpreadRankwas proposed to measure the spreadability of users in microblogs, which consid-ers the time interval of retweets and the location of users in information cascades.Our methods integrating the following four factors:1) The location of users in in- formation cascades was an important feature to measure the spreadability of usersand it stand for the ability to drive the propagation of information;2) The numberof retweets;3) There was a transitive of the spreadability;4) The time interval ofretweets was an important feature to measure the spreadability of users and it standfor the diffused rate of each user. Finally, we conduct experiments on a real da-ta set from Twitter. As can be indicated, the results showed that our method wasconsistently better than PageRank methods with the network of retweets and themethod of retweetNum which measured the spreadability according to the numberof retweets.In summary, we aim at the four key characteristics of microblogs in this disserta-tion. Some key techniques, including spammers detection, influence strength measuring,topical influencers mining, spreadability analysis, were intensively studied. These tech-niques are interesting and useful, and have brilliant perspective on the influence analysisand public opinion mining.
Keywords/Search Tags:Microblogs, Influence, Influence Strength, Spammers, Spread-ability
PDF Full Text Request
Related items