Font Size: a A A

Research On Entity Alignment Method For Linked Open Data

Posted on:2018-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:L LiFull Text:PDF
GTID:2348330518494919Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Whether in the traditional Web scenario or semantic Web scenario,entity alignment is always an important issue,the construction of large-scale knowledge graphs provides a solid foundation for this,and the discovery of OWL:sameAs links between multiple data sources is an important part in it,that is also the main purpose of entity alinment.Existing approaches largely depend on the pre-step results of the schma alignment,and the correlation between the Linked Open Datasets makes the alignment results of the model is not very good,which led to the entity link discovery is not sufficient,so the schma-independent method of the link discovery can take into account.The semantic label features and statistical features of data improved missing links to be found.In this paper,by transforming the traditional schema-based alignment problem into schema-independent binary-classification problem,this paper proposes a new method based on the classification of the attribute set and analysis of the extracted feature vector.The experiment on the datasets in the LOD shows that this method can help some discoveries of missing links and apply this method to the link discovery system to achieve the construction of the knowledge map.The main research contents are as follows:(1)Analysis the semantic features of the Linked Open Data datasets,extracting the text information from the data items in the Linked Open Data datasets,and using the JSON technique to combine the semantic label features to divide the main nine kinds of text information,and constructing the text vector set with significant semantic features.In order to reduce the workload of text processing,this article also introduced an inverted index to generate a candidate entity set.(2)In order to ensure the reliability of the evaluation,the MapReduce model framework is implemented to the attribute text serialization calculation work,it constructs the key-value to storage information,furthermore five categories of comprehensive TF IDF statistical method are selected to model the text characteristics to filter effective information,at the same time to maximize the integrity of the candidate to ensure the integrity of the entity,reducing the computational complexity.(3)Using a supervised machine learning classification algorithm using the feature vector information and the specific link information to link the Linked Open Data datasets,the entity relations of the data sets are effectively classified.In the process of classifier generation,the basic classifier is obtained by using the valid C4.5 algorithm,at the same time,an improved Adaboost algorithm is used for a comprehensive classifier with good performance.The entity alignment algorithm based on machine learning is applied to the actual entity link construction system.In this paper,diverse experiments are carried out on LOD datasets,and the effect of the algorithm is tested effectively.
Keywords/Search Tags:entity alignment, Adaboost, Linked Open Data, machine learning, MapReduce
PDF Full Text Request
Related items