Font Size: a A A

Research On Vertical Search Engine Based On SSH And Lucene

Posted on:2012-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:H H LuoFull Text:PDF
GTID:2218330368487121Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The Internet develop rapidly and Our China Internet users now ranks first in the word.So Not only the imformation of the internet is explosive,General search engine seemed a bit inadequate,but the emergence of vertical search engines is to solve such problems.Lucene is open source and The Segmentation mechanism of Lucene is so simple and there is a hign precision sementation defects.So this paper introduces a improved Chinese word dictionary mechanism that contain bitmap data structure based on conditional random field model.To some extent,the accuracy of segmentation improved and the space complexity of the mechanism Dictionary reduced.The experiments show that: the memory of our segmentation mechanism is reduced and the vertical search accuracy is improved and main works is as follows:(1)This paper discusses the research value of vertical search engine about the background and analyse its core technologies:Chinese segmentation, Lucene page rank technology.(2)This paper study and research Mainly on the Lucene source package necessarily, and analyze the two Chinese word segmentation algorithm: two-word segmentation algorithm and the forward maximum matching algorithm, but the accuracy of the two Chinese word segmentation algorithm is not enough good, thus this paper can improve the existing Lucene sub-word package To improve the accuracy of Chinese word segmentation, and applies it to vertical search engines.(3)we design a vertical search engine using some open source frameworks: Spring,struts2 and hibernate,the system include some modules:The Chinese segmentation modules,The rank module and the index module and so on. In this paper we make use of Html Parser to retrive web page and use CRF model to increase the accuracy of Chinese segmentation.
Keywords/Search Tags:ertical Search Engine, Chinese Word Segmentation, CRF, Lucene, SSH
PDF Full Text Request
Related items