Font Size: a A A

Research On The Model Of Data Sharing Based On Hadoop

Posted on:2016-12-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z S TuFull Text:PDF
GTID:2308330473461976Subject:Information management and information systems
Abstract/Summary:PDF Full Text Request
In such an era that cloud computing and big data technology develops rapidly, many large enterprises are faced with problems such as multiple targets, looseness of organization structure, and lack ofcorporation between different business departments, which means that the development and application of each operation system is isolated from each other. The phenomenon of so-called Information Silo not only brings challenges to system maintenance, but also may incur hidden danger to the application of operation system due to the lowefficiency in sharing information between departments. Thus, how to improve the quality of data sharing has become one of the most active research areas in computer science. Since the traditional way of data share is impossible to fulfill the needs of storage and computation of massive data, new methods are wanted to solve the problems.There are mainly three problems that large enterprises may encounter when it comes to data sharing, i.e., the potential safety risk in the network, the lack of data standard, and the inability of storage andprocessing of massive data. The paper gives resolutions to the first two issues mentioned, as well as proposes a new model of data sharing. Therest of the paper is organized as follows:Firstly, we describe the problems that arise from enterprise data processing, and introduce the state-of-the-art Hadoop system, especially the HDFS and HBase database. Secondly, three models of data integration are studied, i.e.,the one based on Federated Database, the one based on middleware, and the one based on Data Warehouse. Thirdly, we present the solution dealing with the network secure issue and data standard issue in the data sharing. Finally, based on the structure of Hadoop and mode of data integration, the Hadoop based data sharing model is built by HBase database, to which the idea of blackboard architecture and datadistributing strategy of " Publish--Subscribe" is applied. Meanwhile, we simulate the model and analysis the efficiency based on experiments as well.Data sharing that based on Hadoop can not only improve the ability and speed of processing of massive data, as well as the efficiency of corporation of different application system, but also reduce the potential damage the original data sharing method may cause to the entire system. The study of data sharing that based on Hadoop gives reference to large enterprises for sharing of massive data, which can not only fulfill the needs of continual steep increasing and computation of massive data, but also improve the reliability of the system.
Keywords/Search Tags:Data Sharing, Hadoop, Data Integration, Blackboard Architecture
PDF Full Text Request
Related items