Font Size: a A A

Research On Chinese Named Entity Recognition In The Judicial Field Based On Deep Learning

Posted on:2022-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2518306554971279Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Chinese named entity recognition in the judicial field is to accurately identify all kinds of entities in legal documents,which is the basic work of the follow-up application in the field of artificial intelligence.Due to the serious lack of judicial annotation corpus resources and the unique characteristics of Chinese text,there are relatively few domestic studies in this area.This paper analyzes the writing characteristics of legal instruments,and combines deep learning to study the task of Chinese named entity recognition in the judicial field.The main tasks are as follows:(1)In view of the lack of public annotation corpus in the judicial field,this paper manually constructs judicial annotation corpus.First,the data are obtained from the corpus published by China Judgments Online and the "Law Research Cup" challenge competition,and these are cleaned and desensitized with automation technology.Then,according to the processed case documents,the paper analyzes the writing characteristics,and designs reasonable legal annotation norms.Finally,we use data enhancement to expand the training corpus,and finally generate the Chinese entity annotation corpus for the follow-up studies of this paper.(2)In view of the rich and complex types of entities in the judicial field,the common named entity recognition methods can not well identify the entities problems in specific fields.This paper proposes IDCNN based on self-attention mechanism to identify the named entities of legal documents.First,the text semantic information is automatically learned by Bi GRU network to solve the problem of entities ambiguity caused by distance dependence of long sequence.Then the IDCNN network is introduced to extract its key features and capture the more fine-grained entities semantic information in the underlying sequence.Finally,the self-attention mechanism is added to analyze the relationship between characters,and the CRF model is combined to calculate the optimal tag sequence.The experiments indicate that the proposed method can effectively recognize fine-grained entities in legal documents,and improve the effect of named entity recognition in the judicial field.(3)Due to the large differences in the structure and number of words between the majority of ethnic minority names in legal documents and conventional Chinese names,this paper proposes the identification of person names in legal documents based on the bidirectional transformer model to solve the problem of inaccurate identification of person names.By introducing the BERT pre-trained language model to recognize the human name entities,it does not rely on domain knowledge and artificial features,and enhances the ability of extracting contextual semantic features of the model.The experimental results indicate that the method improves the effect of the task of minority translation recognition and realizes the end-to-end named entity recognition.The above research work provides a new research idea for Chinese named entity recognition in the judicial field,and has important reference value for the subsequent application of natural language processing.It also promotes the construction of smart court,which is practical.
Keywords/Search Tags:named entity recognition, judicial field, IDCNN, BERT, self-attention mechanism
PDF Full Text Request
Related items