Font Size: a A A

Research On The Integration Of Unstructured Document Data Storage And Retrieval Technique

Posted on:2016-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:Z H DuFull Text:PDF
GTID:2308330479989641Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the Internet and big data era, new data increases almost exponentially, among which unstructured data accounts for the major proportion. Document data is an important part of unstructured, but the existing relational database tools and techniques are difficult to control them, we must find new ways and techniques to store and retrieve unstructured document data.Firstly, by theoretical analysis and experimental comparison, determining the Hadoop&Elastic Search architecture. Hadoop focuses on solving reliability storage of unstructured document data, while Elastic Search is mainly used to solve real-time retrieval of data.Next, there are several improvements in the unstructured document data storage technology. Hadoop distributed file system has a poor performance on storing a large number of small files. So a small file merging algorithm based on the balance of the data blocks is proposed, it can reduce the system load, improve system efficiency. As Hadoop can’t provide real-time search services, data must be processed to the same structure, and built index in Elastic Search for futhur retrieval. So a plug-based and easily extended framework called unstructured document data isomorphism technique has been put forward. It uses parallel computing framework Hadoop Map Reduce to process data.Then, unstructured document data retrieval optimization techniques are also in research. Combining Elastic Search filter feature with Internet identity data, this thesis proposed retrieval process optimization based on the filter method, has greatly improved retrieval efficiency on partial data. On the other hand, with system parameters and application parameters tuning, the entire data retrieval performance has been further optimized. After so many optimization and improvement on storage and retrieval, there are detailed comparison test experiments and results analysis to confirm the validity of the met hod.Finally, we improve the system architecture base on the related optimization techniques. Then after designing and implementing the system UI, we completed the integration of unstructured document data storage and retrieval system.
Keywords/Search Tags:unstructured, storage, retrival, Hadoop, ElasticSearch
PDF Full Text Request
Related items