Font Size: a A A

Automatic Construction Of Large-scale Software Engineering Knowledge Base

Posted on:2019-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:X DongFull Text:PDF
GTID:2428330590492447Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the arrival of the big data and artificial intelligence era,knowledge base,as a hierarchy of hierarchical structure knowledge,has become a knowledge infrastructure for intelligent applications.In the field of software engineering,knowledge base is also playing a more and more important role in the field of software engineering,such as software defect prediction,semantic correlation calculation,software document correlation analysis and developer recommendation.However,at present,there lack mature software engineering knowledge bases,and its main source is still extracted from the general knowledge base,or it is temporarily built manually,unable to achieve large scale,rich semantic and standardization.Therefore,it is necessary and urgent to construct a large scale software engineering knowledge base.Under this background,this paper based on Wikipedia and Stackoverflow data source,using machine learning methods,automatic mining software engineering concepts and semantic relations between concepts,building a knowledge base of software engineering based on different data sources.At the same time,the knowledge base is aligned and fused with the method of ontology alignment,so that the software engineering domain knowledge base constructed finally has the characteristics of large scale and high accuracy.The main contributions and innovations of this paper include:1)A novel method of extracting software engineering concepts from Wikipediaand Stackoverflow is proposed.In Stackoverflow,we extract the tag set insoftware engineering domain,dig question and answer texts,and find domainconcepts,then use label propagation method to expand domain concepts inWikipedia.This method makes the software engineering knowledge base builton a large scale of Wikipedia and has high accuracy.2)A novel method of automatic extraction of relation between concepts isproposed based on the structural features of Wikipedia and Stackoverflow andsemantic features in software engineering domain.Based on Wikipedia andStackoverflow,this paper designs different structural features of the softwareengineering field between subsumption and apposition relations according tothe different data source structure,combined with the word similaritycalculation method,relation extraction has been done by using machinelearning method.3)An iterative semi supervised learning method is proposed.In order to solve theproblem of missing training data and improving the accuracy of data,this paperproposes a rule based filtering method,which deals with errors andredundancies in data sets,and improves the accuracy of relation extractionresults by iterative semi supervised learning.As the research results,this paper builds and releases a large-scale software engineering domain knowledge base called SETaxonomy.This is the field of software engineering with large-scale and standardized domain knowledge base,which contains knowledge of 247,638 software engineering concepts,429,445 subsumption relations 26,443 apposition relations,and 36,037 related relations.Compared to the existing knowledge bases such as DBpedia,Yago,BabelNet etc.,SETaxonomy has a larger and richer semantic relationships and higher accuracy of domain concepts in the field of software engineering.
Keywords/Search Tags:Software Engineering Knowledge Base, Concepts Extraction, Relations Discovery, Ontology Alignment, Semi Supervised Learning
PDF Full Text Request
Related items