Font Size: a A A

Design And Implementation Of Travel Vertical Search System Based On MongoDB

Posted on:2015-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:H H FeiFull Text:PDF
GTID:2348330479954321Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The rapid growth of the national economy and ongoing improvement of people's living standards have driven the rapid growth of domestic tourism industry. Internet technology changed the way that people access to information, people are more likely to access information which they needed through the Internet. In the domestic tourism industry and fields lots of related websites already exists, but these websites have problems more or less, mainly as lack of valid information, too much useless information, poor user experience, mediocre contents and so on, people are usually confused by the uneven quality of the data when they faced with this huge amount of information. General search engine is usually used as the entrance of information retrieval, but not so effective when used to retrieve specific information, it doesn't match the requirement to access information quickly and accurately. In this context, travel vertical search engine is produced which can provide comprehensive and correct travel related information simply and quickly.Travel vertical search engine is the result of vertical search technology applied on tourism industry and related fields. This thesis has done analyse and research on the theory and technology of vertical search engine, mainly include the Heritrix website crawler system, web page parsing and information extraction technology, the Lucene full-text retrieval system. For the system business data storage, this thesis analysed and researched on the Mongo DB distributed document storage system, also elaborated the election algorithm and data synchronization principle in Mongo DB distributed architecture.A travel vertical search system was designed and implemented based on preceding analysis and research on related technologies and tools. The system used Heritrix to crawl web pages, then used Jsoup and Regex to parse the web page documents and extracted object data accurately, the next step, used Lucene to generate index files according to the extracted structured data and implemented full-text retrieval function on hotels and attractions information base on the Lucene full-text retrieval capability. This system used two databases, it used Mysql to store structured data which data structure is fixed, used Mongo DB to store hotels and attractions business data. System data storage module implemented Mongo DB distributed system and the scalability, stability and load balancing is assessed in this thesis. Ultimately implemented a high performance, high stability, load balancing and easy to expand travel vertical search system.
Keywords/Search Tags:vertical search, web page crawling, information extraction, information retrieval, distributed system
PDF Full Text Request
Related items