Font Size: a A A

Data Acquisition And Propagation Path Analysis Based On Social Network

Posted on:2014-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:K XuFull Text:PDF
GTID:2268330401471535Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the current day, the internet develops at a top speed. The number of the networkuser is growing rapidly which makes the network public opinion also grows rapidly.Especially when the social network platform appears in our eyes, the network publicopinion spreads out of control. So the network public opinion monitoring becomes theresearch hotspot in the recent years. In the paper, we focus on researching the Facebookand the Twitter platforms which are two Representative platforms in the social network.There are two parts in my research: in the first part, we design an excellent gatheringsystem for data gathering of Facebook and Twitter. During the design process, weanalyze the login authorization protocol OAuth、the official gathering API and the thirdparty SDK. Because of the limitations of the Facebook API which cannot get the friend’sfriend list, so we combine the http stream and the official API to gather the Facebookdata. In contrast, The Twitter can get the friend’s friend list, so we only need the TwitterAPI to get the Twitter data. Then we compare the gathering performance of Facebookand Twitter. The second part, we analyze the social network data that we get in the firstpart. First, we filter the2.4million tweets to Chinese tweets and English tweets. Thenwe analyze the rely format of the tweet, preprocessing the tweet and extracting the relyrelation. The last, we use the simhash to process the tweets which are after preprocessingand getting the repeat tweets. The second, we use the time sorting algorithm and the treesorting algorithm that we introduce in the fourth chapter to sort the tweet which aresimilar and contain the official reply format. After sorting, we write the node informationinto the file that is following the gexf file format which coming from the complexnetwork tool gephi. So we can get the tweet propagation path graph. With the gephi wecan get the source node of the tweet that who send the tweet first. Last, we analyze all ofthe propagation path graph that we get and summing some typical propagation pathgraph.
Keywords/Search Tags:social network, simhash, propagation path, removing repeat
PDF Full Text Request
Related items