Font Size: a A A

Design And Implementation Of Social Network Information Crawler

Posted on:2015-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:X LvFull Text:PDF
GTID:2308330464455610Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Web2.0 as a sign of the social networking times, the Internet provides a user-centric communication model and platform. Through social networking platform, users can post messages, share content, add friends and concerned about their character’s interest and other operations. Social networking platform users generally based million units, between the user and the user relationships through mutual concern and even became friends a huge social network, the message can quickly spread on such networks. Most social network platform provides an open API for users and developers to obtain relevant data platform, but calls to the API number is usually limited, which makes the need to obtain large amounts of data is extremely inconvenient. Therefore, the study of social networks related information reptiles have a very important significance.In this paper, a social networking platform for the research object, a reptile related technologies. Currently they use a social networking platform of AJAX technology to achieve rich user interaction, a process procedure for social networking platforms is also crawling AJAX page resolution. The amount of data generated huge social networking platform, some of the data structure is sparse unstructured data, so the use of traditional relational database for storage seem inconvenient. This article uses the non-relational database MongoDB for storage of data; After retrieval of information stored on the basis of MongoDB in order to obtain information of interest to the user.The work as follows:1. Analysis of the existence of the current social network information during the crawling problem, which leads to the need to achieve the design goal of reptiles; 2.Constructed a social network information suitable for crawling web crawler using breadth-first strategy on social network data crawling; 3. Do AJAX using Beautiful Soup parser carried crawling pages of data, so as to solve their own social networking platform open API provides access to limiting the number of defects have data; 4.Climb take the information including user information, user relationship information, web content information and content comment;5.Using a non-relational database MongoDB for data storage, in order to address the ever-growing social network data features.
Keywords/Search Tags:Social networking, Web crawler, AJAX technology, MongoDB, Crawler strategy
PDF Full Text Request
Related items