Font Size: a A A

A Predicting Model Of Microblog Retweeting Based On User And Message

Posted on:2017-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:L ChenFull Text:PDF
GTID:2348330491454809Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of Internet, the social media technology changes with each passing day. Microblog has been the main network platform to spread information. As a new media tool, microblog has integrated into our work, study and lives. It not only changes the old social communication style, but also overthrown the traditional information dissemination model.Sina microblog was founded in 2009.It is a platform for information to distribute, share, spread and get which is based on fans-friends structure. The users can share their microblogs within 140 words and can get others' microblogs by following these people. If someone thinks a microblog is interesting and worth sharing with others, he will distribute it to his fans. This is called retweeting. Usually users will retweet the messages interesting and related to his fans. By convention, retweeting will be indicated with special keyword like(retweeting).The purpose of retweeting is spreading information to fans. Analyzing the behavior of retweeting and finding the factors which influence retweeting, can play an important role in mining hot spots, selling product, monitoring public opinions and controlling rumors. Comparing with traditional social network, the users' relationship and transmission mechanism of messages is more complex in Sina microblog.We use the date in Sina microblog to analyze some features of influencing probability for a microblog's retweeting.e,g.uses' influence and fans' activeness and some content features. Then we choose some important features as parameters to build a predicting model based on uses and message. We mainly finish these work:First, we analyze the method of getting data. At the beginning of the study, we will crawl much user data and message data. So this work introduces two methods of crawling data: web crawler and API.We compare the two methods and choose the API in Sina microblog to get data. The next work is the pre-processing of these data. They are the basic data of our work and others' study. We use multi-accounts to increase call frequency as Sina limit the API call frequency. Also, we avoid data interrupt with delay request. At the preprocess, we use ICTCLAS and Chinese stoplist to smooth data and remove noises.Second, we introduce some user features and message features that influence a tweet's retweeting. Then we choose 15 features making great contributions to retweeting and add them to our model we built beforehand. Some typical features are user's influence, fan's activeness and content features. Then we translate these index into Dual attribute factor with 1 meaning yes and 0 meaning no. For user features, we analyze the relationship between these features and retweeting, and determine the threshold value. For message feature, we raise content features, emotion features, time feature based on previous research. Besides, this paper mines latent topic using LDA model.Third, we give a new method to predict the probability of a microblog' retweeting based on use and message features. We establish a predicting model to predict the probability of retweeting after analyze each factors synthetically. It includes two process: training process and predicting process. We learn all features' weight in a large training dataset. The topic feature needs training individually because of the big quantity.At the predicting process, we build a feature vector space for each new message. Putting it into our model we can get a predicting result. We prove it is feasible to predict which messages will be retweeted. By analyze parameters in our model, we find out these influential factors. So we can study which content is interesting in Sina microblog.
Keywords/Search Tags:microblog, retweet, content feature, predicting model, interestingness
PDF Full Text Request
Related items