Font Size: a A A

Iterative Algorithm For Semantic Integration Across Heterogeneous Medical Databases

Posted on:2009-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:J J LiFull Text:PDF
GTID:2178360278464172Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Semantic integration can eliminate the conflicts of data in heterogeneous databases, and integrate databases between different enterprises and organizations in the same industry. Semantic integration for heterogeneous databases plays an important role in the macro-control of country and the establishment of a public information platform.Combining attributes and tuples information, an iterative procedure can be constructed to solve the problem of semantic integration. Cluster analysis techniques are used to identify semantic correspondences between attributes. Picking up features only from attributes values and using voting method to the clustering results of K-means Clustering Algorithm, Fuzz Clustering Algorithm and improved Chameleon Hierarchical Clustering Algorithm, some semantic corresponding attribute pairs can be identified and be used as the initial attribute-matching results. We choose the rule Ncut instead of Min-cut as the graph partition rule in chameleon.In the iterative process, based on matching attribute pairs, use classification method to detect semantic correspondence between tuple pairs. Select certain size of matching and non-matching tuple pairs as training data to train logistic regress classifier. By using correlation and regression techniques to analyze the matching tuple pairs and evaluate the semantic relationship between attributes, some new matching attribute pairs will be found. Update the matching attribute pairs and execute the next iterative step.Use the drug listing tables from Sanxia Hospital and Nanzhang Hospital as experiment data. The number of attribute is 20 and 27 respective. Select 603 matching tuple pairs and 603 non-matching tuple pairs to experiment and analyse. By analyzing the results, we find that new matching attribute pairs and matching tuple pairs can be identified constantly as the iterative process goes on. And also the accuracy of the matching results is totally high.
Keywords/Search Tags:Semantic Integration, Data Integration, Medical Database, Chameleon Hierarchical Clustering Algorithm, logistic regression
PDF Full Text Request
Related items