Font Size: a A A

Research On Named Entity Recognition In Judicial Field

Posted on:2020-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y M LinFull Text:PDF
GTID:2416330572480280Subject:statistics
Abstract/Summary:PDF Full Text Request
In recent years,artificial intelligence(AI)technologies such as deep learning and natural language processing have developed rapidly,and the construction of smart courts has received extensive attention from the national,academic,and industrial circles.The Named Entity Recognition(NER)studied in this paper is one of the basic tasks of natural language processing.The NER research is carried out in the judicial field case collection to promote artificial intelligence technology in electronic courts such as electronic evidence collection,case analysis,and legal document reading.The implementation aspect of the application is important.To this end,the research on named entity recognition for texts in the judicial field has been completed,and the following work has been completed:(1)The basis of natural language processing tasks is the construction of training corpus.At present,there is no large-scale judicial naming entity corpus,in order to solve the problem of lack of corpus in the field,the corpus construction of the naming entity identification in the judicial field is carried out.Using the Internet information collection technology,the complete judgment of public criminal cases at all levels of courts was obtained from the China Judgment Document Network.The descriptive texts of more than 12,000 judgments were finally marked by the corresponding entity labeling specifications,totaling 3.104 million words,using OSBIE annotation form.More than 212,000 entities were marked,forming the CJNER_Fact naming entity naming entity in the Chinese judicial field.(2)Whether or not to carry or use a weapon in a criminal case will affect the penalty(referred to as sentencing)and even affect the criminal identification(also known as conviction).The weapon information is crucial in the handling of judicial and criminal investigation cases.In the entity category,four types of entities are set up for the judicial field.In addition to the traditional names of people,places,and institutions,from the practical needs of the criminal trial sentencing in the judicial field application scenario,the first task of "murder" entity identification is proposed.Expand the existing entity identification system and combine natural language technology with industry knowledge.(3)In order to better solve the problem of named entity recognition in the judicial field,three types of word vectors are first trained: Word2Vec word vector,Word2Vec word vector and LDA model theme vector.Next,a deep learning training scheme based on different word vectors is developed,and BiLSTM+CRF model,Bi-LSTM-CRF model based on word vector and subject vector(WL-BiLSTM-CRF)is proposed,and character-based and word segmentation is proposed.LDA+ stacked Bi-LSTM-CRF model(WL-bi-BiLSTM-CRF)combined.In this paper,the self-developed legal named entity annotation dataset CJNER_Fact is used to analyze different training objectives and different feature representations.The experimental results show that through the training of Bi-LSTMCRF model based on character segmentation and Bi-LSTM-CRF model based on word segmentation,the number of characters with fewer characters is compared in the model based on character-level segmentation.It is better that the mechanism name entity with more characters has a better effect based on the word-level segmentation model,and is applied to the WL-bi-Bi-LSTM-CRF model.The WL-BiLSTM-CRF model utilizes the global characteristics of the subject vector and the semantic properties of the word vector,and through the Bi-LSTM learning of the sequence,the model effectively predicts the number of labels in the data,which can improve the accuracy of the model and The recall rate solves the problem of uneven sample labels.The WL-bi-Bi-LSTMCRF cascading model identifies the judicial domain entity.The model divides the recognition into two levels.First,the low-level model identifies the character-level text segmentation,identifies the person name entity,and constructs the recognition result.Features are passed into the high-level model;models are modeled based on word segmentation at high levels to identify weapons,organization names,place names,and so on.The experimental results show that the model performs well for all the "killer" entity categories in this paper,and the "personal name" and "institution name" categories are superior to the literature data in the current judicial field.In this paper,the comprehensive micro-average F1 value is up to 89.86%,and the weapon identification F1 value has achieved 90.76% effect.
Keywords/Search Tags:Named entity recognition, judicial field, Legal entity annotation, Word2vect, Topic mode, Deep learning
PDF Full Text Request
Related items