As the number of case texts in the judicial field increases year by year,the demand for manual processing is also increasing greatly compared with the past.The intelligent analysis and processing of a large number of legal documents,the extraction of important information elements,and then assist the inspectors to understand the case more easily,has become an important part of the research in the field of intelligent justice.Among them,the accurate identification of named entities in legal texts and the extraction of inter-entity relations are the basic tasks of element extraction,as well as the important basis for further syntactic and semantic analysis and understanding of texts.Based on the deep learning method,this paper carried out a pilot study on the extraction technology of elements for Chinese legal texts.The main work is as follows:(1)A judicial named entity labeling data set was constructed with the legal documents of drug-related criminal cases as the main body.In order to solve the problem of training data,a corpus of judicial naming entity labeling composed of legal documents was constructed.This paper analyzes the writing standard features of drug-related criminal cases,designs the appropriate entity labeling standard,and makes the corresponding corpus labeling tool.(2)Considering that the weight of drugs and specific drugs mentioned in criminal drugrelated cases will affect the sentencing and conviction results,the information of drugs is of vital importance in the understanding of the case.In terms of entity category,five categories of entities are set for criminal cases,namely time,place,people,drugs and weight(drugs).Considering the actual needs of judicial investigators to understand the circumstances of criminal cases,in-depth learning training programs are customized,and the combination of BiLSTM +Attention is adopted to identify named entities based on legal documents.Based on the result of named entity identification,the corresponding relationship between the entity(weight)extracted from the legal document and the drug entity is improved.The study of relationship extraction was added,that is,a relationship label was added between two entities in the labeled data set: drug and weight.After the BERT model was trained to predict the relationship probability of the two entities,it was determined whether the weight could correspond to the drug.The experimental results show that the F1 value in experiment of named entity recognition based on Bi-LSTM +Attention reached 88.34% and the F1 value in experiment of relation extraction based on BERT reached 82.39%.(3)In order to solve the case retrieval requirements put forward by judicial investigators when reviewing cases,a case information retrieval system was built based on the above experiments.The function of the retrieval system for drug-related cases is not only to extract the information of a single case,but also to search the existing files in the database.For example,if there is a need to see how similar cases are handled during the sentencing phase,judicial investigators can enter the query criteria by drug and weight in the search bar of the system.This system can reduce the burden of judicial investigators to review a large number of text types of cases,so as to more effectively assist investigators to understand the case. |