Font Size: a A A

Research On Key Technology Of Big Data Storage Based On Hadoop

Posted on:2017-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y LuFull Text:PDF
GTID:2348330488488268Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the arrival of the era of big data, the traditional data warehouse cannot meet the growing demand for data storage, the emergence of the Hadoop platform is a good solution to this problem. Hadoop can be deployed on the cheap cluster, with its open source, strong scalability and fault tolerance, it has become the mainstream of big data storage platform, many well-known domestic and foreign enterprises have built up their own big data processing system based on Hadoop. Big data storage is a very important step before data analysis and mining, so the research on big data storage in the academic community is now being carried out.Firstly, this paper introduces the background and significance of the research, the development status of the domestic and foreign big data and Hadoop, and points out some problems existing in the big data technologies; Secondly, the principle and operation mechanism of Hadoop are studied, and the related knowledge of HDFS and MapReduce are introduced; Then the paper studies the data processing architecture based on Hadoop, and focuses on the technologies of data storage layer, such as big data preprocessing technology, big data fault-tolerant technology; After that, a Hadoop Two-level Data De-duplication Storage Architecture is designed, which performs the file level and data block level repeat data delete operation, and improves the performance of HDFS file storage. For HTDDSA, this paper focus on its composition, metadata definition, two levels of data deletion strategy, small file merging strategy and file read and write process; Finally, this article introduces the method of building Hadoop cluster, and the HTDDSA performance is tested, the experimental results shows that HTDDSA can get a higher repeat data deletion rate, and compared with HDFS, the small file writing and reading time of HTDDSA has an obvious degradation.
Keywords/Search Tags:Hadoop, big data, storage technology, storage architecture
PDF Full Text Request
Related items