Research And Implementation On Indexing Mechanism For The Ocean Data Organization

Posted on:2009-03-09

Degree:Master

Type:Thesis

Country:China

Candidate:B Cheng

Full Text:PDF

GTID:2178360275972394

Subject:Computer system architecture

Abstract/Summary:

The index mechanism of existing data organization systems in mostly is based on the general index of traditional data organization,and there are many problems,such as too large scalable index data,too long time for indexing,single types of index data etcã€‚These questions have resulted in massive data retrieval rate in the whole investigation, search for prospective rates and time hardly to meet customer needs. To address these issues and improve the utilization of information, Designing a new index mechanism which can improve performance and quality of data retrieval for the massive unstructured data becomes a major research issuesã€‚Firstly, a massive data organization system is designed. The system meets the needs of the massive information organization and management which are automation, unified interface, pattern extraction, cognition extraction and semantic integration. The system takes the four key technical points which are information pretreatment technology, information storage technology which can be extended, re-organization of information technology and information retrieval technology are determined. In this platform, the work that is related to mass data processing can be to research, such as information indexing.Secondly, based on the massive data organization system, the data indexing mechanism is analyzed and studied, and a hybrid indexing mechanism is presented with its detailed algorithm. The data model of index information of the hybrid index mechanism is information matrix model. The model takes a number of metadata to represent information entities. This information data model reduces the data scale, and at the same time enhances the representation of the internal characteristics of information. The index data of the hybrid index mechanism is metadata of information. The index based on metadata can provide more precise and more semantic index than the index based on content. The index storage models of the hybrid index mechanism have three models:tree model, hash index model and inverted file model. Tree index model is to create index in the memory, as the main entrance of the retrieval. Hash index model is to check the index data and create the bitmap index of the index data. Inverted file model is to create the index on the disk while providing index based on content to support the expansion interface. These models work together in the process of indexing.Finally, based on the theoretical study above, the hybrid index mechanism is implemented, tested and analyzed. The testing and analysis contain two aspects, which are functions and performance.. The test shows that the capacity of index data in the memory is only 4% of the original data, the capacity of index data in the disk is only 1/3 of the original data, the time of indexing reduces by 10% with the similar applications in the massive data organization system with the hybrid index mechanism The results show that the index mechanism i??e mass information organization is a more practical mechanism than others.

Keywords/Search Tags:

Mass data processing, Unstructured data, Massive data organization, Hybrid indexing mechanism

Related items

1	Research And Implementation On Massive Unstructured Data Organization
2	The Research And Application Of Unstructured Data Processing Technology
3	Research And Implementation Of Massive Data Storage System Based On Hybrid Architecture
4	A Multi-level Hybrid Spatiotemporal Index Method For Multi-modal Scene Data
5	Design And Implementation Of Short Video Service Data Warehouse Based On Mass Data
6	Indexing of multidimensional discrete data spaces and hybrid extensions
7	Research On P2P Network Based Vector Gegraphic Data Organization And Indexing Technogoy
8	Research And Application Of Massive Historical Quasi Real-time Data Management Platform
9	Research On Hybrid Queries Of Structured And Unstructured Data Based On Proximity Graph
10	Design And Implementation Of The Middleware System For Unstructured Textual Big Data