Automatic Construction Of Large-scale Software Engineering Knowledge Base

Posted on:2019-12-07

Degree:Master

Type:Thesis

Country:China

Candidate:X Dong

Full Text:PDF

GTID:2428330590492447

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the arrival of the big data and artificial intelligence era,knowledge base,as a hierarchy of hierarchical structure knowledge,has become a knowledge infrastructure for intelligent applications.In the field of software engineering,knowledge base is also playing a more and more important role in the field of software engineering,such as software defect prediction,semantic correlation calculation,software document correlation analysis and developer recommendation.However,at present,there lack mature software engineering knowledge bases,and its main source is still extracted from the general knowledge base,or it is temporarily built manually,unable to achieve large scale,rich semantic and standardization.Therefore,it is necessary and urgent to construct a large scale software engineering knowledge base.Under this background,this paper based on Wikipedia and Stackoverflow data source,using machine learning methods,automatic mining software engineering concepts and semantic relations between concepts,building a knowledge base of software engineering based on different data sources.At the same time,the knowledge base is aligned and fused with the method of ontology alignment,so that the software engineering domain knowledge base constructed finally has the characteristics of large scale and high accuracy.The main contributions and innovations of this paper include:1)A novel method of extracting software engineering concepts from Wikipediaand Stackoverflow is proposed.In Stackoverflow,we extract the tag set insoftware engineering domain,dig question and answer texts,and find domainconcepts,then use label propagation method to expand domain concepts inWikipedia.This method makes the software engineering knowledge base builton a large scale of Wikipedia and has high accuracy.2)A novel method of automatic extraction of relation between concepts isproposed based on the structural features of Wikipedia and Stackoverflow andsemantic features in software engineering domain.Based on Wikipedia andStackoverflow,this paper designs different structural features of the softwareengineering field between subsumption and apposition relations according tothe different data source structure,combined with the word similaritycalculation method,relation extraction has been done by using machinelearning method.3)An iterative semi supervised learning method is proposed.In order to solve theproblem of missing training data and improving the accuracy of data,this paperproposes a rule based filtering method,which deals with errors andredundancies in data sets,and improves the accuracy of relation extractionresults by iterative semi supervised learning.As the research results,this paper builds and releases a large-scale software engineering domain knowledge base called SETaxonomy.This is the field of software engineering with large-scale and standardized domain knowledge base,which contains knowledge of 247,638 software engineering concepts,429,445 subsumption relations 26,443 apposition relations,and 36,037 related relations.Compared to the existing knowledge bases such as DBpedia,Yago,BabelNet etc.,SETaxonomy has a larger and richer semantic relationships and higher accuracy of domain concepts in the field of software engineering.

Keywords/Search Tags:

Software Engineering Knowledge Base, Concepts Extraction, Relations Discovery, Ontology Alignment, Semi Supervised Learning

PDF Full Text Request

Related items

1	The Study Of The Construction Method Of Software Engineering Knowledge Base Based On Ontology
2	Research On The Application Of Geometric Information In The Semi-supervised Learning
3	Research On Knowledge Graph Construction Technology Based On Semi-Supervised Learning
4	Research On Domain Ontology Concepts And Relations Learning Algorithm
5	Knowledge Discovery From Biomedical Literature Based On Semantic Resources And Semi-supervised Learning
6	Automatic Extraction Of Conceptual Relations For Constructing Domain-Specific Ontology
7	Research Of Semi-supervised Knowledge Graph Entity Alignment Algorithm Based On Transfer Learning
8	The Research On Theory And Application Of Ontology
9	The Research Of Ontology Engineering Method And Application Based On Role Concepts
10	Biomedical Entity Relation Extraction Based On Semi-supervised Learning And Deep Learning