Font Size: a A A

Study On Some Key Techniques For Database Grid Based On P2P Framework

Posted on:2009-11-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:G Q WangFull Text:PDF
GTID:1118360308478444Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of information technologies, the amount of information is explosively increasing, including common available database resources. Geographically distributed users all want to access those data sources, such as those in academic fields in high energy physics and biologic computation, in electronic commerce field, and in deep web data query applications etc. Data grid is a WAN based system for managing, accessing and sharing the resources. Database grid is a type of data grid system mainly for the databases, which supports them. First, it makes use of the high processing ability to implement the effective integration of large amount of data, utilizing the common database resources. Meanwhile, it makes use of the data management ability and distributed data integration for further analysis and processing. Thus, the disertation is toward to study some related key techniqiues, including grid architecture based on P2P model, P2P supporting framework, resource management, data integration, data replica management.At present, the research and development on database grid is just at beginning. In recent years, the representative works are protocols and middlewares for accessing database in grid environments, and grid systems for specific application database processing, developed by Database Access and Integration Services Working Group of GGF (Global Grid Forum). Existing works mainly are oriented at specific domain and based on static environments. There are few discussions about how to adapting the uncertainty of database resources. Also there are few reports about the database grids for better dynamic data integration. Although database processing technologies in grid environments are quite similar to those of multi-database systems, parallel database systems and distributed database systems, the existing work is insufficient for supporting the database grid environments with uncertainities. Those uncertainties in grid environments pose challenges for data resource management, resource query processing, data integration, transaction scheduling, massive data analyisis in database grids.The dissertation studies key techniques of database grids for multiple domain and dynamic data integration based on OGSA-DAI specification, service oriented ideas and P2P model. It aims at provides an enabling environment for distributed, heterogeneous database resource management and data source integration, to provide transparent services for the users, by making use of the high processing ability of computer grids.On the aspect of framework to support database grids, to overcome the limitations of centralized resources management, the P2P model is widely used, since P2P can avoid single peer failure, and has good extensibility. To support resource management of multiple domains, based on the Chord structure, a P2P framework--MultiChord for database grid is proposed, which can provide efficient resource management on multiple domains. In schema integration and query processing, to resolve heterogeneity and automocity of the resources in a database grid, global schema ontology is defined and an ontology based query processing strategy is proposed. It divides query processing into two parts, global and local query processing, where the former implements semantic transformation, query rewriting and execution based on ontology, while the latter deals with the data layer conflict resolution and query extension.In execution optimization, to decrease the data transmission cost effectively, a keyword based execution optimization method is proposed, which is based on the idea of "Able men are always busy" to dynamically scheduleing transmission amounts of data resource. It can effectively improve the efficiency of searching data in a database grid.In replica management, to resolve the dynamics of joining or leaving at any time in a P2P system, a multiple-root maintenance based replica management strategy is proposed. It can effectively ensure the robustness of the system and improves the efficiency of resource acquirement.In data summary, to effectively obtain the knowledge from the massive amount of information, a clustering based data profile method is proposed. The data integration level ensures the integerated data to satisfy the schema requirements of users, and the clustering analysis level presents the profile data to the users for intuitive data analysis.Last, the dissertation designs and implements a data grid system DS-Grid for dynamic database integration and multiple domain applications, based on the proposed key techniques. It wraps data resources based on DAI as Grid services, constructs MultiChord P2P model based on JXTA, adopts the XML DBMS Shunsaku for XML repository, and is coded in Java. The system is a sub-system of a National 863 Program project "CIMS oriented service grid for flexible integration of enterprise business". After the testing and evaluation, the system has got high performance and reached the expected aim.
Keywords/Search Tags:Database grid, resource management, dynamic data integration, query execution, replica management, data profile, P2P
PDF Full Text Request
Related items