Font Size: a A A

The Study And Implementation Of Focused Crawler Technology For Android Technical Information

Posted on:2016-08-17Degree:MasterType:Thesis
Country:ChinaCandidate:W Q HuangFull Text:PDF
GTID:2308330482975237Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In order to improve work efficiency of Android R&D in the enterprise, enterprise developed a vertical search engine for Android technical information. Focused crawler provides the most underlying basis data in vertical search engine, so the ability of focused crawler determines the professional level of whole vertical search engines. The focused crawler can identify the direction of efficient to crawl and reasonable location the vertical resources, then will also be able to obtain high efficiency the theme resources. It’s focused on focused crawlers’core part-focused crawling scheduler and discuss, then study and implement two sub-functions of scheduler-assessment for web pages’theme correlation program and tunneling in this paper. So the subject compared the good points or bad points of a lot of web content-based and links’structure algorithms, then analyze the resources of Android technical information to develop assessment for web pages’theme correlation program, and implement a tunneling program. The main contributions are as follows:(1) Implement an algorithm which references Google’s PageRank and improved Shark search for the section which is not suited to theme. Finally, combine with a reference page sibling comprehensive program to crawl;(2) In order to improve accuracy for getting the theme resources, the subject developed by using the VSM-based assessment for the classified web pages’theme correlation program;(3) To expand the coverage of theme resource site for crawling result, use inheritance characteristics and descending collecting tunnel seeds to implement tunneling;Through testing and verifying focused crawler system’s key modules and the whole. The results showed that focused crawler which is implemented and adjusted in the project not only efficient crawling and accurate gaining theme web pages in a vertical resource, and tunneling to solve the theme islet phenomenon, tunneling still visited a large number of pages, waste network bandwidth and computing resources, it is the next work to improve.
Keywords/Search Tags:Focused crawler, Android technical information, Tunneling, Theme lexicon, Crawling scheduler
PDF Full Text Request
Related items