Font Size: a A A

Research And Design Products Lucene Search System Based On Parity

Posted on:2014-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2268330401450332Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,the Internet showed a booming development trend,the amountof online information was explosive growth day by day.At the same time,with themassive growth of imformation, Baidu Google and other general search engineshave been unable to meet the needs of people search on specific areas.For thisreason,the search technology for the specific domain, has become the focus ofresearch at home and abroad, so the vertical search engine was born.The success ofcommercially search sites at home and abroad, in turn, promote the progress ofvertical search technology.Such as the domestic tourism website——“wherenetwork”,and the digital sub-column of sina, tencent and other portal websites arethe users visit constantly.However,when the user input the wrong character in thesesites,the feedback is not the result what the users want.For this reason,how tounderstand the user’s input correctly has also become one of the important contentin the research field of vertical search engine.In this paper, A vertical search engine system will be studyed and implemted---The price comparison search system based on Lucene. The specific researchcontents are as follows:Vertical search engine characteristics and its key technologies will be analyzed inorder to clear a paper topic;This paper introduced the key technology of product price comparison searchsystem indetail and how to implement it,analyzed the web crawler, page parsingtechnique,and the API of Lucene which realizing the function of index and searchfunction.This paper studied the difficult area in the Chinese search engine,introducedtwo kinds of commonly used methods of Chinese word segmentation algorithm, oneof which based on the maximum positive match and another basedon thestatistics,and aiming at the shortcomings of the maximum forward matchingproposed an improved method, which retains the advantages of traditionalmaximum forward matching algorithm, and combined with the word frequency statistics. After through three groups of experiments to verify the improvedalgorithm,the experiment results show that on the segmentation accuracy isimproved.This paper studied the spell checking technology,added the method ofcomputing the Longest Common Subsequence (LCS) to the retrieval module of thesystematic background.With the method, it successfully resolved the problem whichthe system can perform automatic correction and return the what the user wantwhen user input the wrong typos. This method make the system can correctlyunderstand the intent of the users input, and having fault tolerance features.This paper carried out a detailed analysis of various modules to build a verticalsearch engine,and use open source web crawler technology to crawl the specifiedpage,and and use page analytical technology parse the crawl web page into a textformat. Using API provided by Lucene to index and search for these text,and usingthe improved Chinese word segmentation instead of Lucene in Chinese wordsegmentation and add the LCS technology to the retrieval module of thebackground system. The research of this paper, will be a certain significance for thepromotion of the vertical search engine technology research of the domestic.
Keywords/Search Tags:vertical search engine, Lucene, Chinese word segmentation, LongestCommon Subsequence, Web crawler, webpage parser
PDF Full Text Request
Related items