Font Size: a A A

Research And Implementation Of Electronic Document Seach System With Lucene

Posted on:2010-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2178360278475695Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapidly developing of the Internet and the popularizing of the e-government, electronic office documents from all the departments are created continually. It's getting increasingly urgent and important on how to manage these electronic office documents and provide available retrieving mechanisms efficiently to make sure that users upon these mechanisms can find contents they really care about from these documents rapidly, comprehensively whenever they like.This paper analyzes the deficiencies occurred in the current situation of the electronic office document searching system and customize a professional Chinese-English full-text searching system based on Lucene. This paper majorly researches and improves the searching system on two key technologies which are Chinese word segmentation and result sorting algorithm, after the improvement, the system implements the support of the Chinese word processing and is able to fetch the information which users care most on the top of the pages in order to fulfill the requirements of them. In this paper, the new system will processes targeted text extracting and index fix-format converting on a sort of formats of documents such Text, PDF, Word, Excel and so on , it will also helps on the content indexing and storage in order to support full-text searching on various kinds of documents. Besides that, the new system will adopt two different approaches that are, auto-indexing and manual-indexing, to update the indexes in real time to make the index updating process more flexible. This paper also describes the detail design and analysis on all parts of implementation modules in full-text searching system and implements a electronic office document searching system by using SSH.With the testing result, it's manifest that the research on the Chinese word segmentation and result sorting algorithm of the full-text system are successful. This system can fulfill the target aim and the requirements of the users by supporting the entire format searching on the electronic office documents.
Keywords/Search Tags:lucene, full-text searching, Chinese word splitting, result sorting
PDF Full Text Request
Related items