Font Size: a A A

Research And Implementation Of Distribute Massive Text Data Index And Retrieval System

Posted on:2007-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y W ZhangFull Text:PDF
GTID:2178360215470124Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, the volume of text data has greatly increased. So, the technology of high performance full text retrieval and analyzing on text content on massive text data has been developed rapidly.Based on a large scale transaction processing system-StarTPMonitor, we first analyze the principle and implementation of search engines and text databases. By comparing the advantage and disadvantage between these two technologies, we choose text databases to figure out our tasks.Based on the text databases technology, the thesis mainly discusses:1. Study on the structure of massive text data Retrieval system. Using multiple- database architecture to manage massive text data.2. Focus on the problem that the indexing performance decreases remarkably with the increasing of text data scale, we propose an efficient way of data storage management and index maintenance.We divide the data into parts. By loading and indexing these parts parallelly into database, the speed of text indexing is independent of data scale. It keeps the largest speed of the current system can reach in the text indexing process.3. As the data scale increases, the query response time increases. In order to solve this problem, we propose a technique to cache query results. In this way, we cache part of the query results. It can decrease the response time remarkably.4. We propose a text database tuning model on different software and hardware conditions. Based on the model, we design and implement a program for text database performance tuning. Using this program, we can find out the critical parameters relative to performance of the system.5. We optimize text index maintenance performance for particular applications based on Oracle databases.At the end of this paper, we do a series of tests to verify that our system can reach or even exceed the expect performance and functional requirements.
Keywords/Search Tags:Full Text Retrieval, Text Indexing, Index Tuning, Parallel Loading, Parallel Indexing, Massive Data, Text Database
PDF Full Text Request
Related items