Font Size: a A A

Construction Of A Large Scale Linked Open Schema And Its Application In Software Engineering

Posted on:2016-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:J G ZhuFull Text:PDF
GTID:2428330590488884Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Linking Open Data(LOD)is the largest community effort for semantic data publishing which converts the Web from a Web of document to a Web of interlinked knowledge.While the state of the art LOD contains billion of triples describing millions of entities,it has only a limited number of schema information and is lack of schema-level axioms.To close the gap between the lightweight LOD and the expressive ontologies,this paper contributes to the complementary part of the LOD,that is,Linking Open Schema(LOS).This paper introduces Zhishi.schema,the first effort to publish English linked open schema.This paper collects navigational categories as well as dynamic tags from more than 20 various most popular English social Web sites,which are rich sources to extract large scale schema level knowledge.This paper proposes a learning-based method with novel features to capture equivalence,subsumption and relate relationships between the collected categories and tags,which results in an integrated concept taxonomy and a large semantic network.Moreover,in order to further explore the usability of knowledge base in software engineering area,this paper also applies a machine learning based method to build a large scale software programming knowledge base named Software.zhishi.shema from Stackoverflow as an extension of Zhishi.schema in software engineering domain.The contribution mainly includes:1)This paper designs a united representation of category(tag)labels.The representation allows finer grained similarity measures on some part of a category.Then some key observations by empirically investigating the Yago taxonomy are listed.These insights further guide us to choose suitable measures.2)This paper adopts an association rule mining algorithm to find some general operation patterns of hypernym-hyponym transformation from both positive and negative examples.This paper further uses these patterns to design rule-based features.For Stackoverflow,this paper also leverages information from questions,wiki descriptions and users to design features.Totally,a sophisticated feature set is provided to measure semantic relatedness between categories(tags).3)This paper leverages a semi-supervised learning method with novel features to detect multi-relations.A blocking mechanism is used to reduce the number of category pairs to be calculated to ensure our approach can be applied to a large scale scenario.Also,a carefully designed post-processing step is proposed to revise the misclassified results in each iteration of the learning process.Experimental results show the high quality of Zhishi.schema and Software.zhishi.schema.Compared with category systems of DBpedia,Yago,Babel Net,and Freebase,Zhishi.schema has wide coverage of categories and contains the largest number of subsumptions between categories.In order to further test the usability of Software.zhishi.schema,this paper designs a similarity computing task of words from software programming.The results show that our dataset can outperform other knowledge bases such as Word Net and Wikipedia due to its high coverage with finer-grained domain concepts.
Keywords/Search Tags:Linking Open Schema, Ontology Learning, Taxonomy Construction, Synonym Detection, Software Programming Knowledge Base
PDF Full Text Request
Related items