Font Size: a A A

Study And Implementation Of Data Acquisition Technology In Social Network

Posted on:2017-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:H XuFull Text:PDF
GTID:2308330482995746Subject:Software engineering
Abstract/Summary:PDF Full Text Request
We live in the era of big data, hundreds of millions of people spend a lot of time on the social network in an unprecedented speed to share, exchange, contact, interaction, and generated a huge amount of users’ data, These abundant data provide a great opportunity for academic research and development of product. With the API socicial networking platforms provide, we can easily get data to study, some platforms do not provide API, we need to write a crawler to get data, which is the content of this paper.In this paper, we take the best social online Q&A communities Zhihu as the research object, and study the related technologies of the web crawler. The main research of this paper includes: the analysis of the use of Ajax technology, web crawler, crawling strategy, the simulation of the website login, multi thread design, unrepreated url, etc.. The contents of this paper are as follows:First of all, analyze the main problems of the crawler and design the main modules of the crawler. In this part, this paper introduces the design of crawler related knowledge, such as Ajax, URL, multi thread, to parse the page. It introduces the related knowledge of the social network, such as social network representation, centrality, and other factions. In this part, the design and implementation of the most main modules, such as analog login module, a proxy server control module, user analysis module, question analysis module, topic analysis module, data storage module, control module, user network adjacency matrix generated module block. In this part, realizes the basic function of the crawler.Secondly, I designed the crawler for all users of Zhihu and the crawler for all questions. In this part, I studied the strategies of the cralwer. I use breadth-first search, depth-first search and the strategy based on the struct of Zhihu to design the crawler for all users of Zhihu. The crawler for all question of Zhihu acquire data from zhe page of all questions of Zhihu. Then I took advantage of the data to analyse zhe struct of the users, the distribution of the number of answers and so on.Finally,I designed the focused crawler of Zhihu. In this part I disigned the strategy which based on breadth-first search. I took advantage of the data the focused crawler acquired to analyse the structure characteristics of Zhihu communities from the spects of centrality and subgroup.
Keywords/Search Tags:Social Network, Crawler, Multithread
PDF Full Text Request
Related items