Font Size: a A A

Professional Search Engine Data Storage

Posted on:2008-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:X F ChenFull Text:PDF
GTID:2208360215454564Subject:Education Technology
Abstract/Summary:PDF Full Text Request
As is known to all. a well organization of the huge data changed dynamically is the premise of the search engine to provide a high quality service, so the data store process is always playing a great role in it. And the speciality of the data in the topic-specific search engine makes the research of the data store process relatively more important.In this paper, four aspects of data storage systems for topic-specific search engine are talked, which are data entities and the relationships between them, a suitable data compression algorithm, Design of the index coding structure for the system, the index distribution strategy and the data management. Above all, the design of the data compression and the data index are the key points.Through the analysis of the Web page and the characteristics of the disaggregated data, we propose a new data compress algorithm which based on the dictionary and the statistical algorithm, combining the static and dynamic statistics, the basic units mixed with the high frequency words and it has achieved good results in tests.We point out the shortage of the design of the inverted index based on the database for the topic-specific search engine first, instead, we bring forward a inverted index based on file system, as a result, we can take control of the scale of the indexed data easily and improve the search efficiency. We take account of the coding style for the topic-specific information and the additions or the deletions of the data in the future .as to the distribution of the indexed data, we just analyze the the advantage of file distribution according to the data category and its feasibility.
Keywords/Search Tags:topic-specific search engine, data storage, data compression, data index, data update
PDF Full Text Request
Related items