Study And Implementation Of Data Acquisition Technology In Social Network

Posted on:2017-05-16

Degree:Master

Type:Thesis

Country:China

Candidate:H Xu

Full Text:PDF

GTID:2308330482995746

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

We live in the era of big data, hundreds of millions of people spend a lot of time on the social network in an unprecedented speed to share, exchange, contact, interaction, and generated a huge amount of users’ data, These abundant data provide a great opportunity for academic research and development of product. With the API socicial networking platforms provide, we can easily get data to study, some platforms do not provide API, we need to write a crawler to get data, which is the content of this paper.In this paper, we take the best social online Q&A communities Zhihu as the research object, and study the related technologies of the web crawler. The main research of this paper includes: the analysis of the use of Ajax technology, web crawler, crawling strategy, the simulation of the website login, multi thread design, unrepreated url, etc.. The contents of this paper are as follows:First of all, analyze the main problems of the crawler and design the main modules of the crawler. In this part, this paper introduces the design of crawler related knowledge, such as Ajax, URL, multi thread, to parse the page. It introduces the related knowledge of the social network, such as social network representation, centrality, and other factions. In this part, the design and implementation of the most main modules, such as analog login module, a proxy server control module, user analysis module, question analysis module, topic analysis module, data storage module, control module, user network adjacency matrix generated module block. In this part, realizes the basic function of the crawler.Secondly, I designed the crawler for all users of Zhihu and the crawler for all questions. In this part, I studied the strategies of the cralwer. I use breadth-first search, depth-first search and the strategy based on the struct of Zhihu to design the crawler for all users of Zhihu. The crawler for all question of Zhihu acquire data from zhe page of all questions of Zhihu. Then I took advantage of the data to analyse zhe struct of the users, the distribution of the number of answers and so on.Finally,I designed the focused crawler of Zhihu. In this part I disigned the strategy which based on breadth-first search. I took advantage of the data the focused crawler acquired to analyse the structure characteristics of Zhihu communities from the spects of centrality and subgroup.

Keywords/Search Tags:

Social Network, Crawler, Multithread

PDF Full Text Request

Related items

1	Design And Implementation Of Social Network Information Crawler
2	Social Network Data Acquisition Technology And Implementation
3	The Research And Implement Of Topic-focused Web Crawler Based On SVM Classification Algorithm
4	Research On Social Network Sampling Algorithm Based On Random Jump Strategy
5	Research On Topic Focused Web Crawler And Related Technologies
6	The Design Of Specific Topic Web Crawler And Its Transmission Group
7	Design And Implementation Of Keywords-based Microblog Crawler System
8	The Extraction And Analysis Of The Social Network Data
9	Design And Implementation Of A Social Network User Account Correlation System
10	Research On The Microblogging Crawler Related Technologies