Font Size: a A A

The Design And Implementation Of Vertical Search Engine Based On Lucene

Posted on:2016-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:S ZhouFull Text:PDF
GTID:2308330479982163Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As modern society entered the Internet era, information grows rapidly.Traditional search engine is often based on breadth, it is difficult to get information we wanted, to meet the needs of users effectively direct vertical search engine arises at the historic moment, therefore, it collects and processes the information in specific field, and provide feedback to the user’s search.Based on the smart phone searching demand as the background, this paper designed and developed a vertical search engine based on Lucene for smart phones.This paper first studied the general composition and working principle of search engines,especially of the key technologies involved in the vertical search such as topic web crawler, web information extraction technology and index technology,it also discusses the participle architecture and working principle of the Analyzer of Lucene,then studied the key technologies of Chinese word segmentation, including the difficulties and commonly used algorithm of Chinese word segmentation. Finally adopts the mechanical lexical,based on self-built mobile dictionary,adopt the word segmentation method based on Trie tree structure dictionary, implemented a suitable Chinese phrase segmentation in the field of smart phones.Compare with other open source Chinese phrase segmentation,it determine the Chinese phrase for mobile piece of word segmentation accuracy is better.Finally the participle apparatus used in Lucene analyzer, set up the core components of mobile vertical search engine.This paper first analysed the demand of the mobile phone vertical search engine and designed the system architecture,then divided the function modules architectureand designed the database.Built the software environment according to the open source framework.Finally design and implementation of the system in detail,the main work including extended the Heritrix framework, improving existing crawl strategy to grab particular information on the network; using the Htmlparser API to extract the HTML document,using Spring+Hibernate+MySQL architecture to build the query module of the mobile vertical search engine,using the DWR technology to do the asynchronous request in the search module,the system Analyzer based on a self-built mobile dictionary. This system is verified by inquiry experiment, precision advantage compared to general search engine.
Keywords/Search Tags:Vertical Search, Chinese Word Segmentation, Theme Crawler, Crawling Strategy, Lucene
PDF Full Text Request
Related items