Research On Accessing Method For Intelligent Retrieval Of Massive Multi-Structured Data

Posted on:2014-04-26

Degree:Master

Type:Thesis

Country:China

Candidate:Z J Wang

Full Text:PDF

GTID:2268330422964757

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet, Data Information showed exponential growth,particularly in the smart retrieval, in terms of the way of Dataâ€“Handling, the traditionalstand-alone mode has already could not be well adapt to the current massive dataprocessing. With Hadoop Distributed File System as well as the emergence of MapReducedistributed parallel processing of vast amounts of information, that way has graduallytransformed into a distributed parallel processing mode. In the intelligent retrieval of themass of information, it can be efficiently processed by MapReduce.After the feasibility analyzing of the hadoop and mapreduce distributed parallelprocessing environment for storage and retrieval of the massive-data, based on therequirement of intelligent retrieval for semantic and content features of multi-structedmassive-data, we proposed a kind of organization policy and access method. In terms ofthe features of Luceneâ€™s full-text retrieval technology, we designed to establish theinverted index and the positive index for the text-file of the characteristics of massive-data.Because of Hbase distributed database can be well adapted to the storage ofmulti-structured data, we designed the high-dimensional feature library, includingsemantic and content features. In terms of distributed parallel processing of themassive-information, the larger files can be obtained higher efficiency, so we proposedand implemented the way of dealing with massive large file access method for small files,combined with the characteristics of distributed file system and the small files. Based onthe above access methods, design and implementation the intelligent retrieval, includingsingle-mode and multi-modal clustering similarity retrieval, and optimized for caching. Interms of MapReduce distributed processing, we proposed and implemented a method ofoptimization for itâ€™s shuffle. After a number of experiments, the access method can beobtained strong practical.

Keywords/Search Tags:

mass storage, semantic retrieval, access method, distributed parallelprocessing

PDF Full Text Request

Related items

1	The Design And Implementation Of A Distributed Storage And Retrieval System
2	The Methods And Optimizations For Mass Data P2P Distributed Steady Storage
3	Distributed Storage And Retrieval Of Multi-layer Vision Data Of Tobacco Fields
4	Research And Application About USB Mass Storage Device Driver Under Embedded Linux
5	The Method Of Flash Based Mass Non-relation Storage
6	Acquisition, Storage And Retrieval Of E-commerce Mass Data
7	Research And Application Of Distributed Storage Technology Based On Semantic Metadata
8	Research On Distributed Storage And Retrieval System Based On Hadoop For Massive Video
9	Research On Distributed Storage Technology Based On Mass Data
10	Research Of High Performance Data Storage And Retrieval Of Distributed Real Time Database Based On Cloud Computing Technology