The Research And Application Of Enterprises Documents Search Engines Based On Lucene

Posted on:2010-12-10
Country:China
GTID:2178330332481889Subject:Computer application technology

With the rapid development of the enterprise information construction, enterprises have an increasing number of electronic documents. Fast and efficiently accessing to useful information has become a very important problem. Because enterprises documents relate to some business or technology information, the use of commercial search engines may make lead enterprises documents leak. So designing an enterprise documents search engines to quickly and efficiently retrieve information about the document has becom a hot research area.On base of the design of enterprise corporate documents search engine, we take advantages of word segmentation based on string matching techniques and word segmentation based on statistical technology, and put forward an effective method of Chinese word segmentation based on dictionaries and Statistical methods. Both use the fast characteristics of dictionary word segmentation and the capacity of identifying new words of the statistical segmentation method. Compared with the word segmentation of lucene, it has a greater improvement, reduce the number of documents of keywords, and improve the accuracy of segmentation and index quality. Based on vector space model, we introduce classification method to reduce the scope of the document collection, at the same time adopt a weighting system identifies the importance of the document collection and the effectiveness of the documents order have been improved.Using lucene we realize enterprise documents search engine system. Introducing Chinese word segmentation based on dictionaries and statistical methods, and vector space model of weighted classification sorting algorithm, the system improves the core model of lucene and accuracy of search results, and make it has higher practical value. At the same time we design the cache of index and searching and web level for enterprise documents search engine, effectively improve the performance of the system. Then in order to ensure the system performance, we introduce the index structure based on database to lucene index.The experimental results show that the enterprise documents search engines based on lucene improves the effectiveness of document sorting and retrieval efficiency of lucene, the accuracy of search results also has been improved.
Keywords/Search Tags:Information Retrieval, Enterprise Documents Search Engines, Word Segmentation, Index
