Font Size: a A A

Research On Domain Ontology Concept Extraction Based On E-CNN

Posted on:2021-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y TaoFull Text:PDF
GTID:2428330629951039Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Ontology,as a set of terms describing a domain,has obvious advantages in the construction of domain knowledge maps.Therefore,it is of great significance to study the concept extraction of domain ontology.In the idea of information extraction,ontology concept extraction can be realized through named entity recognition technology.As one of the subtasks of information extraction technology,named entity recognition technology has been widely used in various natural language processing tasks.With the development of deep learning,neural network has been applied to the named entity recognition model and achieved good results.However,in the task of entity recognition in Chinese domain,especially in the recognition of compound entities in some professional domains,some existing methods of named entity recognition still have some problems of poor accuracy and low efficiency.In view of these problems,the following work is carried out in this paper:(1)A convolutional neural network(CNN)with gating mechanism combined with conditional random field(CRF)model is proposed for the task of named entity recognition.The model is mainly composed of word vector training and embedding module,gating CNN module and CRF module.In the word vector training and embedding module,the text data set is segmented and annotated,and the text is converted into word vector and embedded into the convolutional neural network by Word2 vec.In the gated CNN module,the optimized CNN of the gating mechanism is used for text classification and context representation.Finally,CRF module is used for decoding to obtain the final annotation sequence.Compared with the traditional model in the dataset of named entity recognition task,this model obtained the micro-average accuracy rate of 91.05%,recall rate of 89.93% and F1 value of 90.49%,which verified the effectiveness of this model in the Chinese named entity recognition task.(2)On the basis of the above model,with the idea of ensemble multi-convolution kernel convolutional neural network(E-CNN),this paper proposes a gated ensemble CNN-CRF model for named entity recognition in Chinese domain,aiming at the difficulty of complex entity recognition in Chinese domain.On the basis of the gated CNN model,by setting convolutional windows of different sizes for the convolutional layer,the obtained feature information can be integrated to form richer information,which can effectively solve the problem of complex entity recognition caused by inaccurate entity boundary division in the Chinese domain.The hybrid model has been tested on the text dataset of Chinese medical domain,which obtain the micro-average accuracy rate of 87.90%,the recall rate of 89.10%,and the F1 value of 88.50%.Compared with the lstm-based model commonly used on the same dataset,the F1 value is 2.11% higher,which verifies the superiority of the model in compound entity recognition in the Chinese domain.
Keywords/Search Tags:Domain ontology, Named entity recognition, Gated linear unit, Ensemble convolutional neural network
PDF Full Text Request
Related items