Font Size: a A A

Research On Accessing Method For Intelligent Retrieval Of Massive Multi-Structured Data

Posted on:2014-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z J WangFull Text:PDF
GTID:2268330422964757Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, Data Information showed exponential growth,particularly in the smart retrieval, in terms of the way of Data–Handling, the traditionalstand-alone mode has already could not be well adapt to the current massive dataprocessing. With Hadoop Distributed File System as well as the emergence of MapReducedistributed parallel processing of vast amounts of information, that way has graduallytransformed into a distributed parallel processing mode. In the intelligent retrieval of themass of information, it can be efficiently processed by MapReduce.After the feasibility analyzing of the hadoop and mapreduce distributed parallelprocessing environment for storage and retrieval of the massive-data, based on therequirement of intelligent retrieval for semantic and content features of multi-structedmassive-data, we proposed a kind of organization policy and access method. In terms ofthe features of Lucene’s full-text retrieval technology, we designed to establish theinverted index and the positive index for the text-file of the characteristics of massive-data.Because of Hbase distributed database can be well adapted to the storage ofmulti-structured data, we designed the high-dimensional feature library, includingsemantic and content features. In terms of distributed parallel processing of themassive-information, the larger files can be obtained higher efficiency, so we proposedand implemented the way of dealing with massive large file access method for small files,combined with the characteristics of distributed file system and the small files. Based onthe above access methods, design and implementation the intelligent retrieval, includingsingle-mode and multi-modal clustering similarity retrieval, and optimized for caching. Interms of MapReduce distributed processing, we proposed and implemented a method ofoptimization for it’s shuffle. After a number of experiments, the access method can beobtained strong practical.
Keywords/Search Tags:mass storage, semantic retrieval, access method, distributed parallelprocessing
PDF Full Text Request
Related items