Font Size: a A A

Research On Named Entity Recognition For Judgment Documents

Posted on:2022-10-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y DengFull Text:PDF
GTID:2506306545455364Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As the final product of trial activities,the judgment documents contain abundant information,through the named entity identification,it can lay a foundation for the construction of judgment documents knowledge graph.At present,some corpora have been developed in the study of judgment documents,but the labeled entities are not comprehensive.For the industry and subject entities concerned in this paper,there is no relevant corpora publicly available at present.In addition,due to the lack of word segmentation tools for judgment documents,the quality of word segmentation is not high,which affects the effect of named entity recognition.Therefore,to avoid the impact of word segmentation errors,this paper mainly studies the character-based named entity recognition of judgment documents.Considering the role of word information,this paper proposes two methods for integrating word information in character-based models.Specifically,the following three aspects of research work have been carried out:(1)A corpus is constructed for named entities based on civil judgment documents,the following are collectively called judgment documents corpus.The main procedures include analyzing the structure of the judgment documents,preprocessing them,and formulating the corresponding annotation specifications,then form a usable experimental corpus.(2)A model based on the direct integration of characters and words information.On the basis of acquiring character information,the model simply splices the pre-trained word vector information.For long sequence corpus such as judgment documents,the model takes a single character as input,selects BILSTM as encoder,and then adds a layer of attention mechanism to calculate the representation of input characters in context.At the same time,in order to make use of lexical information,this paper uses the CBOW model to train a large number of unlabeled judgment documents corpus,and obtains pre-trained word vectors.Finally,the word vector and the character representation in the context are spliced,then input to CRF layer for label prediction.(3)A model based on multi-level feature fusion of characters and words information.On the one hand,the above model of direct integration fails to fully explore the potential information of words.On the other hand,compared with the single embedded representation,the representation method of multi-level feature fusion of characters and words can often obtain more effective information.Therefore,a model based on multi-level feature fusion of characters and words is proposed,to make full use of word information in character-based models.Specifically,the model takes characters as input,firstly uses BILSTM and CNN to comprehensively excavate character-level features from multiple levels,and then obtains word-level features through word coding.Finally,the two are fused to form the final representation of the original input sequence,input it into the model and training to complete entity identification and annotation tasks.The experimental results show that the model based on the direct integration of characters and words information can effectively improve the performance of named entity recognition for judgment documents.The performance of the method based on multi-level feature fusion is better than the baseline method,and better than the model based on the direct integration.
Keywords/Search Tags:Judgment Documents, Named Entity Recognition, BiLSTM-CRF Model, Industry and Subject, Characters and Words Feature
PDF Full Text Request
Related items