Font Size: a A A

On Jackrabbit Packing Hadoop And It's Application In Content Management System

Posted on:2012-09-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2178330335960843Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
This paper will present the widely discussed and applied of distributed file system and content repository in computer science and technology field as the background, study how to build the Jackrabbit on Hadoop module. This will be an important part of the realization of the mass data storage—content management system.The system will use Jackrabbit as the implementation of Java Content Repository, which provides standard interfaces to the upper layers, and people don't need to know what the storage layers are. The system will automatically choose Hadoop Distributed File System(HDFS) to access the big data, and HBase to access the small data, which can solve the problem of the inefficiency of storing small data into HDFS. HDFS is part of Hadoop, which is the implementation of google Distributed File System. HBase is also part of Hadoop, which is the implementation of google BigTable. The demarcation of the big and small data will be decided by testing, which guarantee the performance of the system. The new system—Jackrabbit on Hadoop will enhance the ability to deal with small data.The system will use MapReduce as the distributed computing framework, which is part of Hadoop, too. It is the implementation of google MapReduce, which provides a simplification of the distributed programming model that allows the program automatically distributed to the general machine consisting of a large cluster of concurrent execution. Jackrabbit packaging MapReduce will provide real-time access to content repository for the contents of the data preprocessing. It will become a combination of content management system and precision analysis platform.This thesis focuses on the implementation of Jackrabbit of Hadoop and it's applications in content management system. In the first chapter, the content management system, the Java content repository and the precision analysis platform are introduced, which describe the source and background of this article. In chapter 2, the design and implementation of Jackrabbit on Hadoop is described, which includes the implementation of Jackrabbit on HDFS&HBase, and the implementation of Jackrabbit on MapReduce. The application of the above package is introduced in chapter 3. In this chapter, the file operations, the content organization, the system performance, the interfaces provided by storage layer, and the preparation of data analysis are described in details. The final chapter introduces the following work of Jackrabbit on Hadoop and the future of implementation is prospected.
Keywords/Search Tags:HDFS(Hadoop Distributed File System), MapReduce, Content Management System, Precision Analysis Platform
PDF Full Text Request
Related items