Font Size: a A A

Design And Implementation Of Keywords-based Microblog Crawler System

Posted on:2017-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:T YeFull Text:PDF
GTID:2348330512465070Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of microblog technology,many interesting research issues in microblog have aroused growing attention.Everyone can make their voices heard,others can also hear a different sound,thus creating a huge amount of information and information fragmentation.The purpose of this system is to better serve the public,so that companies,enterprises,individual users can quickly find any negative message to keep the loss to the minimum.The crawler designed in this paper is aimed at microblog and mainly searches for relevant keywords.In this result,it carries out the breadth-first strategy.The designed crawler can solve the vertical crawling,dynamic webpage and automatic logging problems which can't be solved by the general crawler.This paper proposes the design and implementation of a microblog crawler system to solve these problems.Fetching data from microblog is the groundwork of these researches.The design is to provide keywords to crawling and extracts bloggers,microblogging,fans and so on.The main work of this paper is as follows:1.Login simulation.The system can access to microblog login page instead of homepage,the message of a user is encrypted by Base64 then sent to the server,in order to automatic login,the system should obtain the cookie sent by server.2.Information collection and filtering.The system uses keywords-based microblogging crawler system to visit microblog websites which obtains information on homepages of it.The crawled pages will be filtered out.3.Key information Extraction.Extract information from downloaded html files based Jsoup extraction method and Xpath and based on HTML page structure method.4.Date updated and stored.Using a fixed time crawling way to update the data source information page,regard the MySQL database platform as a persistent storage platform.The study and implementation of this design meets the major needs of companies and individuals who are eager to find the negative and positive information.
Keywords/Search Tags:social network, crawler system, microblog, Webpage Extraction
PDF Full Text Request
Related items