Font Size: a A A

Research On Database Schema Summarization Based On Label Propagation Method

Posted on:2015-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:X K LiFull Text:PDF
GTID:2298330467979738Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of database technology and the explosive growth of database scale, complex databases are challenging to explore and query by users unfamiliar with their schemas. Even if there is a large number of available schema documentation, users have to waste a lot of time to understand them. Therefore, how to comprehend the database schema quickly has become a hot topic, which is called schema summarization. Schema summary is a succinct overview of the entire database, which contains important elements of the original database and achieves a wide range of information coverage. How to generate an efficient schema summary is the main goal of this paper.This paper tackles the task of schema summarization based on semi-supervised learning method:label propagation algorithm. The main contributions include:Firstly, the database is mapped to a labeled schema graph. This paper proposed the importance of each table in the database as its stable state value in a random walk over the schema graph. The importance of the table depends both on its information content, and on how that content relates to the content of other tables in the database.Secondly, this paper proposed a similarity model based on multiple linear regression models with some similarity features. This similarity model can improve the accuracy of the generated schema summary.Thirdly, this paper proposed a schema summarization approach based on label propagation algorithm, which combines supervised and unsupervised learning algorithms. Label propagation clustering achieved higher accuracy.In this paper, the experiments are designed on the TPC-E benchmark dataset and a real database CSEMIS. Experimental results show that the proposed schema summarization approach provides users with a more accurate schema summary than other solutions.
Keywords/Search Tags:database, schema summarization, label propagation, semi-supervisedlearning
PDF Full Text Request
Related items