| Large scale semantic Knowledge Bases(KBs),such as,YAGO,DBpedia,and Wikidata,have played a very important role in various practical applications,such as,question answering systems,search engines,and decision support systems.At present,common large-scale semantic knowledge bases have problems,such as,inconsistency,abnormal information and wrong relations,etc.Therefore,how to ensure the quality of the extracted semantic KBs in massive Web data is a major challenge in the field of current Web information research.In the work of quality assessment and knowledge completion of large semantic KBs,errors in KBs are usually automatically cleaned and corrected to achieve the purpose of improving the quality of semantic KBs and completing semantic information.In view of this condition,this thesis focuses on the completion of large-scale semantic KBs,and finally realizes the quality improvement of large-scale semantic KBs by cleaning,analyzing and correcting erroneous information in large-scale semantic KBs.Moreover,we provide effective information to complete semantic KBs,thereby providing higher-quality semantic information for different application systems(e.g.,question answering systems).The main research work and contributions of this thesis include:A knowledge cleaning model with guiding rules is proposed,and the cleaning problems of inconsistent information are solved in large semantic KBs.We select the semantic KBs with Wikipedia series,and propose a rule learning algorithm for the inconsistency of semantic knowledge,Guided Inductive Logic Programming(GILP),to clean inconsistency problems in KBs.The GILP learns a set of targeted consistency constraints for unstable training sets,and adds guidance to the traditional Inductive Logic Programming(Inductive Logic Programming,ILP),which can not only solve the problems of inconsistency,but also solve the problems of system extraction mistakes in KBs.GILP generates positive and negative rules to predict valid information and clean errors in large semantic KBs,respectively.According to the characteristics of the semantic KBs,the GILP algorithm is further improved,and the Guided Inductive Logic Detection(GILD)algorithm is proposed to automatically clean the semantic information inconsistency in the semantic KBs.The GILD model effectively saves labor costs and improves mining efficiency.A guided inductive logic learning model is proposed to simultaneously clean and correct erroneous knowledge in semantic KBs,solving the problems of incomplete information in the large semantic KBs.Aiming at the inconsistency problems,we propose a Guided Inductive Logic Learning(GILLearn)model to correct errors in the KBs.The GILLearn model obtains correction rules by query rewriting algorithms,so as to realize synchronous cleaning and restoration of semantic knowledge in the KBs.For other errors in the semantic KBs,we propose a triple correction assessment framework(Triple Correction Assessment,TCA)with associated KBs for knowledge correction.TCA matches other KBs with co-existing entities,and proposes a triple matching algorithm with knowledge co-existence for correction.In this paper,problems such as inconsistency,outlier attributes,and wrong relations in the KBs are rectified by combining GILLearn and TCA.The experimental results demonstrate that the framework can effectively correct the wrong information in the large KBs,and complete the KBs by repairing the semantic knowledge.A rule learning model with abnormal features is proposed to improve accuracy of correction rules in large semantic KBs.The outliers are acquired from predictions by positive rules.According to the abnormal information,we update the non-monotonic logic programming algorithm and propose the Nonmonotonic Inductive Logic Programming(NILP)algorithm.In contrast,we extract the corresponding abnormal attributes from positive and negative rules to renew the ILP algorithm,and construct a new rule learning algorithm: Inductive Logic Programming with Exceptional Features(EILP).NILP and EILP improve the traditional non-monotonic logic learning according to different abnormal information,which can improve the quality of rules and mine more effective semantic knowledge.An Inductive Logical Correction Method with Exceptional Features(EILC)is proposed to further improve the accuracy of the correction rules in the GILLearn.We combine the EILP and EILC to construct an Inductive Logic Learning with Exceptional Features(EILLearn)framework with outliers.EILLearn is an improvement to the GILLearn model,which improves the accuracy of correction rules from the perspective of abnormal information.A knowledge base cleaning and correction system is implemented to complete the semantic KBs,and simultaneously solved knowledge cleaning and correction in large semantic KBs.Combined with the above algorithms,we finally build a complete large-scale semantic KBs correction and completion system for practical applications.This system mainly solves the problems of inconsistency,abnormal information,and wrong relations in large semantic KBs.We propose two algorithms,GILP and EILP,to learn positive and negative rules at the entity layer and concept layer of large KBs to clean wrong information.And two correction rules are proposed at two different attribute levels to simultaneously clean and correct wrong semantic information.Moreover,the question answering system by semantic KBs for the backend can determine and correct the corresponding results by the associated KBs correction model,TCA.It can be seen that the KBs correction and completion system can be utilized for knowledge pre-processing of application systems,such as,search engines,question answering systems,and decision-making systems.The system cleans and corrects erroneous information in the back-end KBs of these applications,and improves the utilization rates of KBs. |