| As a fundamental element of modern Web Search Engine, the technology of Chinese word segmentation has been studied as a hotspot for a long time. Lucene, as a member of open source, is a mature toolkit which can be easily used for information indexing and retrieval. We also could master the essential of Lucene by the analysis of the source code and the experimental programming . Due to the simple yet powerful core API, Lucene is able to be integrated into our application rapidly. However, the core and extended libraries in Lucene only enable automatic Chinese segmentation in the same way that English words are segmented. The big grammar difference between English and Chinese made the result dissatisfied. After the detailed study of full-text indexing and retrieval approach which Lucene uses to implement word segmentation, this thesis develops a highly effective mechanical Chinese word segmentation based on Hash structure.Nowadays, there are several dictionary mechanisms for information process, and they are binary-seek-by-word, TRIE indexing tree and binary-seek-by-character. The last two methods have higher inquiry efficiency. All of the above three methods improve their inquiry efficiency using sorted liner table with complex data structures and poor inquiry efficiency. In this paper, advantages and shortcomings are analyzed. In order to satisfy the special inquiry in Chinese segmention we design and implement a segment dictionary based on Hash and analyze the performance.A desktop search engine system is designed on the basis of former research. Lucene Framework is adopted in index and searching and an effective Chinese word segmentation mechanism is developed. In the end, test results on the correctness and speed of the mechanism are given. |