Font Size: a A A

Research And Implementation On Indexing Mechanism For The Ocean Data Organization

Posted on:2009-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:B ChengFull Text:PDF
GTID:2178360275972394Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The index mechanism of existing data organization systems in mostly is based on the general index of traditional data organization,and there are many problems,such as too large scalable index data,too long time for indexing,single types of index data etc。These questions have resulted in massive data retrieval rate in the whole investigation, search for prospective rates and time hardly to meet customer needs. To address these issues and improve the utilization of information, Designing a new index mechanism which can improve performance and quality of data retrieval for the massive unstructured data becomes a major research issues。Firstly, a massive data organization system is designed. The system meets the needs of the massive information organization and management which are automation, unified interface, pattern extraction, cognition extraction and semantic integration. The system takes the four key technical points which are information pretreatment technology, information storage technology which can be extended, re-organization of information technology and information retrieval technology are determined. In this platform, the work that is related to mass data processing can be to research, such as information indexing.Secondly, based on the massive data organization system, the data indexing mechanism is analyzed and studied, and a hybrid indexing mechanism is presented with its detailed algorithm. The data model of index information of the hybrid index mechanism is information matrix model. The model takes a number of metadata to represent information entities. This information data model reduces the data scale, and at the same time enhances the representation of the internal characteristics of information. The index data of the hybrid index mechanism is metadata of information. The index based on metadata can provide more precise and more semantic index than the index based on content. The index storage models of the hybrid index mechanism have three models:tree model, hash index model and inverted file model. Tree index model is to create index in the memory, as the main entrance of the retrieval. Hash index model is to check the index data and create the bitmap index of the index data. Inverted file model is to create the index on the disk while providing index based on content to support the expansion interface. These models work together in the process of indexing.Finally, based on the theoretical study above, the hybrid index mechanism is implemented, tested and analyzed. The testing and analysis contain two aspects, which are functions and performance.. The test shows that the capacity of index data in the memory is only 4% of the original data, the capacity of index data in the disk is only 1/3 of the original data, the time of indexing reduces by 10% with the similar applications in the massive data organization system with the hybrid index mechanism The results show that the index mechanism i??e mass information organization is a more practical mechanism than others.
Keywords/Search Tags:Mass data processing, Unstructured data, Massive data organization, Hybrid indexing mechanism
PDF Full Text Request
Related items