Font Size: a A A

Word Segmentation-based Enterprise Document Search Engine Design And Realization

Posted on:2008-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:H B ChenFull Text:PDF
GTID:2208360212478915Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the popularization of Computer and Network, more and more enterprises use computer to handle document. In this process of management, many documents can be made. It brings up an important research task, that is how to retrieve useful information from tremendous amount of information resource effectively and accurately. For solving this problem, we design a Chinese and English document search engine which is used for information retriving from enterprise document.The key techinques concerned in designing of search engine include Chinese segment, data collection, inverted index, result sorting, analysis of human behavior, etc. Search engine consists of information collection, indexing, query. First,search engine collects documents using crawler.Then,the documents are analysed by indexer, and indexes are created.Searcher accept user query requests,find relevant results through indexes.Finally,the results are sent to user after sorted.In this paper, Chinese word segmenting which is the basic technology in Chinese search engine is first discussed in the paper, and some deep research on the realization method of Chinese wrd osegmenting is made. A Chinese word segmentation system is studied, which fits for enterprise document search engine. Then the key techniques in Search Engine: index technique and retrieve model is deeply studied. A bidirectional index method can be stored in database is proposed.Using this index method, the complexity is reduced effectively.Result document sorting is baed on synthesizing Boolean Model and Vector Space Model.On the basis of general Web search engine techniques and according to the speciality of enterprise document, a Chinese and English search engine model using File system monitor technique is designed too on the paper.At last, the enterprise document search engine is realized in Windows environment using VC.
Keywords/Search Tags:Chinese word Segment, Search Engine, bidirectional index
PDF Full Text Request
Related items