Font Size: a A A

Research On Summarization Technology Of Relational Database

Posted on:2016-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:T T LiFull Text:PDF
GTID:2308330464972433Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the volume of the database expanding, the schemas also have become more and more complicated. Due to lack of documentation, users need to spend lots of time understanding and using the database. Existing summarization methods of database only do on the schema. They are based on the foreign key relationship between tables or community detection algorithms, through formulating the corresponding evaluation function, picking out the most important table as the results of the generalization. Then they are provided to the user. But with the deep study of database, the number of instance in each table has increased greatly, so schema summarization can’t meet the need of users.Based on this, we mainly generalize the database from two aspects in this paper. One is schema summarization, by improving the existing community detection algorithms. It doesn’t need to know the community number in advance and can classify closely linked tables into the same community, then selecting the most important table as the label of each community. In the second part, we propose instance summarization of the database in this paper firstly. Based on the schema summarization, we pick out important instances of each table. The results consist of original instance and new instance. The summarization of original instance is based on text mining technology and the fan-out value. By using text mining technology, we can choose feature items which have large weight as the results of generalization. We also introduce the concept of fan-out in this paper. The greater the fan-out value is, the stronger the capacity of instancing linked edge will be, so the record is more important. We can pick out the important tuples for the user. The research about the new instance is based on the similarity technology between text phrases, we can get the results through clustering a column values in the table. We also introduce the summarization method for numeric data, and all these algorithms are verified useful by our experiments. Not only can it pick out important tables for users, but also select the characteristic instances through summarizing the instances. The method we produce can make users who are not familiar with the subject information have an overall comprehend and it is very convenient to retrieval and query information in the future.
Keywords/Search Tags:Schema Summarization, Community Detection, Instance Summarization, Original Instance, New Instance
PDF Full Text Request
Related items