Font Size: a A A

Design And Implementation Of Distributed Online Travel Search Crawler System

Posted on:2014-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:X L XuFull Text:PDF
GTID:2248330398971949Subject:Information security
Abstract/Summary:PDF Full Text Request
With the fast development of Internet technology and tourism, especially with the improvement of people’s living standards in recent years as well as the rise of online tourism, more and more users tend to travel by ordering online tours. Due to the great increment of travel pages, the online travel search engine has become a new important point in search engine area.This paper first introduces the research background and significance of distributed online travel search engine and some knowledge of web crawler. Combining the techniques and strategies of search engine with related knowledge of distributed web crawler, the paper makes a detailed analysis and research on the key technology which can be used by the system, such as distributed task allocation strategy, URL filter technology and online travel page updating strategy. According to the characteristics of the travel page, the paper also comes up with the online travel page judging algorithm.Based on these key technologies and strategies above, the paper illustrates how to design and implement the distributed online travel search crawler system in order to fulfill the user’s demand for online travel pages of the travel platform and agency websites. In the system design part, the entire system is divided into four main modules in accordance with the implementation function, including the control server, the crawler server, index retrieval server and database module. Each module has been given detailed structure design and classes diagram. Finally, the paper gives the detailed realization processof the control server and crawler server. The system use Java as the development language and Tomcat+Apache+Mysql as the development environment to implement the entire system.In order to verify the feasibility of the entire distributed online travel search system, the paper uses five servers to build a test run environment to do the function and performance test.The analysis of test data shows that the online travel page judging algorithm can distinguish the online travel pages with the accuracy of90percents. Besides, through the performance of the system running test, it is clear to see that the system can collect online travel pages steadily and efficiently no matter with one crawler server or the entire system. And the system also provides a nice web search page which meets the initial design requirements and has an important practical value to the tourism industry.
Keywords/Search Tags:Search Engine, Online Travel, Page Judging Algorithm, Distributed Crawler
PDF Full Text Request
Related items