Font Size: a A A

Design And Implementation Of An Enterprise-based Information Search Engine

Posted on:2008-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:X F KongFull Text:PDF
GTID:2178360212473984Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer and network technology, the number of network users and the amount of network resource are increasing in exponential order. To efficiently search information in the vast sea of data, powerful retrieval system is required. Search engines are designed to meet such requirement. They search the network, find data information, process and extract useful information for the user.Search engines fall into two categories of application, internet-based and enterprise-network-based. This thesis presents the design and implementation of a campus-network-based search engine, describes the underline vector space model (VSM), and related techniques, such as term selection, weighting strategy, and query optimization using relevance feedback, which are widely used in document classification, automatic indexing, information retrieval.There are explicit delimiters between words in western languages such as English. But in Chinese sentences, words are continuously arranged. Thus, Chinese word segmentation becomes a first step in many of the Chinese information processing applications, such as machine translation, document classification, document retrieval, document filtering, and statistics of word frequencies, etc. In this paper, we propose an efficient algorithm for Chinese word segmentation. Theoretical analysis and experiment proved that this method is more efficient than other string-comparison-based segmentation methods.A Chinese information search engine consists of two major parts: front end user query and back end index maintenance. User query processing accepts Chinese word sequence typed into the system by the user, segments the words, calculates the similarities of indexed documents to the query, and returns results of retrieval to the user. Index maintenance is for the system manager to update the index database. The thesis gives detailed description of the implementation of these parts.The search engine described in this paper has been fully tested and will be used in practice.
Keywords/Search Tags:information retrieval, search engine, Chinese information processing, computer network
PDF Full Text Request
Related items