Design And Implementation Of Social Network Information Crawler

Posted on:2015-08-03

Degree:Master

Type:Thesis

Country:China

Candidate:X Lv

Full Text:PDF

GTID:2308330464455610

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Web2.0 as a sign of the social networking times, the Internet provides a user-centric communication model and platform. Through social networking platform, users can post messages, share content, add friends and concerned about their characterâ€™s interest and other operations. Social networking platform users generally based million units, between the user and the user relationships through mutual concern and even became friends a huge social network, the message can quickly spread on such networks. Most social network platform provides an open API for users and developers to obtain relevant data platform, but calls to the API number is usually limited, which makes the need to obtain large amounts of data is extremely inconvenient. Therefore, the study of social networks related information reptiles have a very important significance.In this paper, a social networking platform for the research object, a reptile related technologies. Currently they use a social networking platform of AJAX technology to achieve rich user interaction, a process procedure for social networking platforms is also crawling AJAX page resolution. The amount of data generated huge social networking platform, some of the data structure is sparse unstructured data, so the use of traditional relational database for storage seem inconvenient. This article uses the non-relational database MongoDB for storage of data; After retrieval of information stored on the basis of MongoDB in order to obtain information of interest to the user.The work as follows:1. Analysis of the existence of the current social network information during the crawling problem, which leads to the need to achieve the design goal of reptiles; 2.Constructed a social network information suitable for crawling web crawler using breadth-first strategy on social network data crawling; 3. Do AJAX using Beautiful Soup parser carried crawling pages of data, so as to solve their own social networking platform open API provides access to limiting the number of defects have data; 4.Climb take the information including user information, user relationship information, web content information and content comment;5.Using a non-relational database MongoDB for data storage, in order to address the ever-growing social network data features.

Keywords/Search Tags:

Social networking, Web crawler, AJAX technology, MongoDB, Crawler strategy

PDF Full Text Request

Related items

1	Research And Implementation On Theme Web Crawler Of Supporting Ajax
2	Research On Topic Focused Web Crawler And Related Technologies
3	Design And Implementation Of A Web Crawler System Supported AJAX
4	Research And Implementation Of Web Crawler For URL-Specified Crawling Of Ajax-Based Web Applications
5	Design And Implementation Of A Web Crawler Friendly To Ajax
6	A Web Crawler Supporting AJAX
7	Research On The Microblogging Crawler Related Technologies
8	Research And Implement Of Distributed Crawler System Supporting AJAX
9	Design And Implementation Of An Ajax Supported Deep Web Crawler System
10	Research On An Ajax Supported Deep Web Crawler Model