Font Size: a A A

Research On Multilevel Overlapping Schema Summarization Method For Databases

Posted on:2017-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:M YuFull Text:PDF
GTID:2348330503492390Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the increasing growth of the information technology, databases are utilized ubiquitously. Considering the fact that modern databases are usually composed of hundreds of tables but generally lack instructions document, querying an unfamiliar database is difficult for users, where they have to spend significant cost to understand it. Schema summarization is a promising method to address this problem by providing users an overview of the database relationships and schema constitution. It summarizes the schema and devides it into several clustered categories, each of which represented by a topic table. Schema summarization helps to improve the usability of databases.Existing summarization methods mainly focus on non-overlapping scenarios, i.e., each element in a database only belongs to one category, but ignore the fact that some elements may belong to multiple categories simultaneously, which is common in modern large-scale databases. Thus, non-overlapping schema summary is not enough for users to understand the database structure and developing an overlapping schema summarization method is necessary and meaningful. Furthermore, due to the large scale of the modern database, the simple way to classify the database schema once may cause excessive categories, which are still too much for the users. Therefore, in this paper, we have designed an efficient method to generate multi-level overlapping schema summaries automatically.In this paper, we first introduce the research background and significance. Then considering the disadvantage of the current non-overlapping schema summarization methods, we novelly propose an overlapping schema summarization method for relational databases. The design of the proposed method mainly consists of four parts: First, we develop a mapping scheme to map the database schema to a multi-labeled graph to store the category information with corresponding labels. Second, inspired by the concept of relative entropy, we propose a new method to measure the similarity degree. Third, by introducing a multi-label propagation algorithm, the database schema is divided into several overlapping groups. Fourth, to refine the partition, we further divide the overlapping groups by using a hierarchical clustering algorithm and obtain a multi-level overlapping schema summary accordingly. By leveraging comparing experiments, we demonstrate that the proposed multi-level overlapping schema summarization method can not only achieve a higher accuracy but also find out the overlaps efficiently.
Keywords/Search Tags:relational database, overlapping schema summarization, multi-label propagation, schema graph
PDF Full Text Request
Related items