Font Size: a A A

Design And Implementation Of Distributed Search Crawler System Based On Mobile Software Application

Posted on:2016-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:X YaoFull Text:PDF
GTID:2298330467993047Subject:Information security
Abstract/Summary:PDF Full Text Request
With the continuous development of the Internet, more and more mobile application stores have appeared, there have been many malicious applications in store, the managerment of store becomes a big problem. Developing a web crawler system is necessary for providing data which is selected for the detection of relevant departments.Firstly, the researching background of the system has been introduced,make a simple introduction of web crawler and other related knowledge. And then the key technologies of distributed crawler system were studied, including distributed task allocation strategy, access to JS information. Appropriate strategy for this system is proposed in all kinds of distributed task allocation strategies.Have proposed a reverse analysis technology based on IDA in obtaining JS information. Communication between production network and office network using rabbitmq message queue server to complete information exchange.Based on the previous research of key technologies, develop a distributed search crawler system about mobile software applications. The system includes the following modules,control and management module、crawler module、production network and office network communication module、a database module, download and upload module、the module for retrieving. Several of these modules were designed and studied, also have designed the flow chart for these modules According to the flow chart to achieve a specific module.At the last,run of the whole system and get the results.Through statistical analysis of the results of the whole system, the distributed crawler system to achieve is more efficiency than single server crawling; crawling web coverage, reliability, page updates are ideal; some JS information can’t be obtained have solved,which makes the page information more completely; rabbitmq server can use a better solution to the problem of communication between production network and office network; the system has a certain ability to explore malicious applications in mobile application store, for the related personnel to do a preliminary screening to detect; retrieval module not only provides query capabilities of apps,but also can modify information of the relevant modules,which makes users feel better. The distributed search crawler system about mobile software application meets the individual needs of users better, has practical significance.The main work are the followings:In order to meet the system efficiency, scalability:do a distributed crawler system, do research on distributed assignation strategy,achieve communication method between the management server and crawler server.In order to meet the system coverage, reliability:do a detailed study for dynamic web pages crawled, using network packet capture, simulation methods based on browser, reverse technology based on IDAIn order to meet the system’ s timeliness:do the research on mechanisms of crawling pages, according to the update strategy designed in this paper, crawling mobile application store.In order to satisfy data security of the system: set up physical isolation between the production network and office network,do the research on information interactive,have used rabbitmq server to transport information between production network and office network.
Keywords/Search Tags:Distributed Crawler, Application Stores, JS Web Page, Communication between production network and office network
PDF Full Text Request
Related items