Font Size: a A A

Study On Methods In Provenance Data Clustering

Posted on:2019-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2428330545454099Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data Provenance can trace reproduce and display the evolution of the target data.Provenance Systems automatically monitor system calls and collect dependence relationships between files and processes.Data provenance has unique advantages in tracing evolution of data and data trustworthiness,so it has broad application prospects in data engineering and information security.In order to ensure the traceability of data,provenance data of exciting provenance systems include large-scale fine-grained dependency and generation association.So,the scale of the provenance data is larger than target data.The large-scale provenance data reduces the efficiency of provenance queries,increases the costs of storage,calculation and management.Moreover,the data associations are too complicated and detailed,so that,it's difficult to understand provenance results and obtain key provenance features.So,the quality of data provenance is greatly reduced.For this problem,this paper focuses on the coarse-grained provenance and clustering of provenance data.The fine-grained provenance data and correlations are combined into coarse-grained provenance data.Meanwhile,the key provenance features of provenance data need to be maintained.Therefore,the main contribution of this paper includes three aspects:(1)We propose a provenance data global clustering method based on node centrality.First,the calculation formula of node centrality is defined.And then,the similarity between nodes which have direct dependency relationships is measured by comparing their node centrality to present users with a summary graph formed by semantically meaningful clusters.Final,Coarse-gained provenance is conducted based on the summary graph.(2)We propose a provenance data local clustering method based on node alienation.First,the calculation formula of node alienation is defined.And then,the key data provenance is answered by starting with a target node and adding the closely related nodes to grow a cluster around it.(3)Since timestamp is a critical property of provenance data which reflects the popularity of provenance data,this paper proposes a method of provenance data variable granularity clustering based on time-domain.First,the method of time-domain division is proposed.And then,based on the results of provenance data global clustering,the clusters which are in the same time-domain are merged.The main innovative contributions of this paper are as follows:First,we adopt a provenance data global clustering method based on node centrality,which obtain a semantically meaningful division result.Second,we propose a provenance data local clustering method based on node alienation.This method can effectively complete coarse-grained provenance about key data by filtering out the provenance data which have large alienation from target node,keeping the closely related nodes.Third,a provenance data variable granularity clustering method based on time-domain is proposed.By reasonably dividing the time-domain,the clustering granularity of nodes is dynamically adjusted in different time domain to realize the variable granularity clustering of different heat sources.This paper chooses the standard provenance trance collected by PASSv2 as the experimental data and verifies the feasibility and effectiveness of the clustering methods this paper proposed...
Keywords/Search Tags:Data Provenance, Clustering, Coarse-gained Provenance, Node Centrality, Dynamic Clustering
PDF Full Text Request
Related items