Font Size: a A A

Research On Entity Recognition Method For The Construction Of Software Engineering Knowledge Graph

Posted on:2022-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z K XuFull Text:PDF
GTID:2518306740994489Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
Named entity recognition aims to identify phrases that can refer to specific entities from natural language texts.It is the basic task of natural language processing and the automated construction of knowledge graphs.In recent years,building a high-quality software engineering knowledge graph is not only beneficial to accumulate and reuse valuable software engineering experience,but also effectively improve the search and recommendation performance in intelligent software testing and development.Entity recognition in the field of software engineering is software engineering knowledge extraction Important task.Pre-trained language models have been widely used in named entity recognition tasks in general domains.However,software engineering researchers rarely apply pre-training language models to named entity recognition tasks in the field of software engineering.Therefore,the existing software engineering named entity recognition methods have a gap between the recognition effect and general field entity recognition.In addition,due to the scarcity of corpus resources and the difficulty of labeling in the field of software engineering,named entity recognition in the field of software engineering is still a typical named entity recognition problem in low-resource scenarios.To this end,this paper carries out research on entity recognition methods in software engineering.First,the entity recognition model based on the pre-trained neural network is applied to the software engineering entity recognition task.Then,some entity recognition methods are designed for the low-resource problem in the software engineering named entity recognition task scenario,Finally,the corresponding experiment was designed to verify the rationality and effectiveness of the model and methods proposed in this paper.Specifically,the main work of this article includes:1)A software engineering named entity recognition model based on pre-trained neural network is proposed: In recent years,pre-training language models can be selfsupervised training on a large-scale corpus,which can capture the deep semantic features of the text,and then form a high-quality word embedding representation.Therefore,this paper designs a software engineering named entity recognition model based on the pretrained language model BERT.First,this model obtains the basic word embedding representation from the pre-training language model BERT,and makes full use of the deep semantic information provided by the pre-training language model.Then,a character-level convolutional neural network is used to enhance the representation of unseen words,and a bidirectional recurrent neural network is combined to learn the contextual features of the text to form the final sequence feature matrix.Finally,the conditional random field is used to decode the sequence feature matrix to obtain the corresponding output sequence.Experimental results show that the structure of the model designed in this paper is reasonable,and it has an excellent performance in the task of software engineering named entity recognition.2)The methods for recognizing low-resource entities in the field of software engineering are proposed: Aiming at the low-resource phenomenon in the task of software engineering named entity recognition,this paper designs corresponding low-resource entity recognition strategies.First,build a domain knowledge base in the software engineering field by combining external knowledge,and use external knowledge to enhance the performance of the entity recognition model.Then,for the case that a large number of entities in the software engineering named entity recognition data set are not marked,a conditional random field under the condition of incomplete marking is designed.Finally,some loss functions are proposed for the unbalanced distribution of entities in the data set,which improves the effect of software entity recognition when the data is unbalanced.Corresponding experiments on real software engineering-related text data sets are designed in the chapter of experimental evaluation of this paper.The experimental results show that the methods proposed in this paper for low-resource entity recognition in the field of software engineering helps to improve the effect of entity recognition in the field of software engineering.
Keywords/Search Tags:Named entity recognition, Pre-trained language model, Low-resource entity recognition, Software engineering, Knowledge Graph
PDF Full Text Request
Related items