| In the era of big data,the rapid development of graph data has brought many opportunities and challenges.As graph data is widely used,its quality issues have gradually become prominent with the explosive growth of data.Therefore,it is critical to identify and fix errors and missing information in the graph.In recent years,researchers have continuously explored and studied methods to improve the quality of graph data,proposing many effective approaches.However,the current research methods for graph data repairing often have the following limitations:(1)Rule-based methods mainly rely on graph quality rules to detect erroneous data in graph data and solve these quality problems by one-time repair according to the graph quality rules.(2)Learning-based repair methods utilize graph embedding models to learn the structural information in the graph,mainly focusing on predicting and assigning missing data in the graph,and cannot detect and repair erroneous data.In summary,existing graph data repairing methods have some shortcomings in solving the problems of missing values and errors in the graph,and a new method is needed to capture the important features of the graph that determine the quality of the restored data.The main contributions of this thesis are as follows:Firstly,this thesis proposes an iterative graph repair framework to address the data quality issues in graphs without attributes and the limitation of learning-based methods in error detection and repair.The framework consists of three modules: graph embedding module,error detection module,and error repair module.The graph embedding module uses a predicate logic-based graph embedding model,which jointly models the predicate logic and training data to learn more predictive vector representations of entities an relationships.The error repair module adopts the error repair strategy based on neighborhood and predicate logic.The strategy considers the influence of neighborhood and predicate logic on the repair triplet.In addition,the framework iteratively updates the graph embedding model using repaired triplets and triplets obtained from the predicate logic rules.Secondly,to address the data quality issues on attribute graphs and the problem of rule coverage in rule-based methods,this thesis proposes an attribute-enhanced graph embedding model.The model utilizes rich semantic information inherent in entity attributes in the attribute graphs to enhance the representation capabilities of entities and relations in triplets.Furthermore,this thesis proposes an error repair strategy based on attribute neighborhoods,which takes into account the impact of attribute and neighborhood information on repairing triplets and utilizes attribute information to describe entity features and structural information in entity neighborhoods to help repair erroneous values in triplets.Finally,this thesis proposes a unified graph data repair prototype system based on the above research,which can repair different types of graph data on demand.The system uses a data cleaning module to preprocess input data,uses a third-party module to discover predicate logic rules,and then integrates the two graph data repair methods proposed in this article to achieve error data detection and repair.The parameters of each module are set uniformly in the configuration center. |