The Research On Full-Text Search Engine Based On Multi-Level Hash Word Segmentation

Posted on:2009-03-21

Degree:Master

Type:Thesis

Country:China

Candidate:L Su

Full Text:PDF

GTID:2178360245969586

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

As a fundamental element of modern Web Search Engine, the technology of Chinese word segmentation has been studied as a hotspot for a long time. Lucene, as a member of open source, is a mature toolkit which can be easily used for information indexing and retrieval. We also could master the essential of Lucene by the analysis of the source code and the experimental programming . Due to the simple yet powerful core API, Lucene is able to be integrated into our application rapidly. However, the core and extended libraries in Lucene only enable automatic Chinese segmentation in the same way that English words are segmented. The big grammar difference between English and Chinese made the result dissatisfied. After the detailed study of full-text indexing and retrieval approach which Lucene uses to implement word segmentation, this thesis develops a highly effective mechanical Chinese word segmentation based on Hash structure.Nowadays, there are several dictionary mechanisms for information process, and they are binary-seek-by-word, TRIE indexing tree and binary-seek-by-character. The last two methods have higher inquiry efficiency. All of the above three methods improve their inquiry efficiency using sorted liner table with complex data structures and poor inquiry efficiency. In this paper, advantages and shortcomings are analyzed. In order to satisfy the special inquiry in Chinese segmention we design and implement a segment dictionary based on Hash and analyze the performance.A desktop search engine system is designed on the basis of former research. Lucene Framework is adopted in index and searching and an effective Chinese word segmentation mechanism is developed. In the end, test results on the correctness and speed of the mechanism are given.

Keywords/Search Tags:

Search Engine, word segmentation, Lucene Hash

PDF Full Text Request

Related items

1	The Research And Application Of Chinese Word Segmentation Technology In Search Engine
2	The Research And Implementation Of Full-Text Search Engine Based On Lucene
3	The Research And Implementation Of Enterprise Search Engine Based On Lucene
4	Enterprise Search Engine Based On Lucene
5	Chinese Natural Language Search Engine Based On Lucene
6	Research And Implementation Of Subject-oriented Mobile Search Engine Based On Lucene
7	The Research And Application Of Search Engine Based On Lucene
8	The Research And Implementation Of Search Engine Based On LUCENE
9	Research On Vertical Search Engine Based On SSH And Lucene
10	Research And Design Of Search Within Application System Based On Lucene