Font Size: a A A

Research On Data Store Of Search Engine

Posted on:2006-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:H HeFull Text:PDF
GTID:2168360152971158Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As appearance and popularization of Internet, people can access much more information. The method of gaining information is changed Internet become main source of information. How to retrival interesting information rapidly become an attractive research area with the exponential growth of information on the Web. Search engine was introduced to solve this problem.Search engine is the combination of traditional information retrieval and web.Traditional information retrieval get information from document repository, which core technique is to index and search text information. Traditional directory and full-text search are used in process of search. It can meet the needs when information is not large. When racing distributed, volatile and large volume data, traditional information retrieval can't find the exact information rapidly.Search engine is the extendability of traditional IR techniques, concerning the key techniques : data collection, Chinese word segmentation, inverted index, retriving hidden data, distributed architectures, huge data store, analysis of human behavior, etc. Search engine consists of information collection, indexing, query. At first, Search engine collects web page from internet using crawler. Then, the web page data are analysed by indexer and indexes are created. Searcher accept user query requests, find relevant results through indexes. Finally, the results are sent to user after sorted.The processed data in search engine mainly include web page data, indexes data and url data. They have different characters in capacity, the update period. How to manage the data efficiently is one of the key technique of search engine and the key content in this thesis.In this paper, basic concept and current status of research of Web search engine are firstly introduced, the architecture of search engine and key technique are illustrated; then the style and characters of the data which stored in search engine are analysed, some designs of data store are poposed and the data store implements of other search engine are discussed; finally, an implement of data store system named WDB is illustrated in detail, which be used to support the data store of crawler.
Keywords/Search Tags:Internet, Web, crawler, data store, search engine, information retrieval, inverted index
PDF Full Text Request
Related items