Font Size: a A A

Research On Chinese Named Entity Recognition Algorithm In Legal Field Based On Deep Learning

Posted on:2021-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:D R HuFull Text:PDF
GTID:2416330647461959Subject:Engineering
Abstract/Summary:PDF Full Text Request
At present,the research of English named entity recognition is very mature,and due to the difference between Chinese and English and limited by the impact of public data sets,the development of artificial intelligence in the Chinese judicial field is relatively slow.The research of this paper is based on the analysis of the content and writing characteristics of the judicial documents in the legal field,combined with my own understanding of Chinese named entity recognition.The main research work is as follows:1.In view of the lack of public annotation data set in the judicial field,the labelled data set which can be used in the judicial field named entity recognition is made manually.In this research,relevant Internet technology is used to obtain the judgment documents of various cases from the China Judgments Online(http://wenshu.court.gov.cn/)and "Law Research Cup" Judicial Artificial Intelligence Challenge of China.After a series of processing and labeling,an effective labelled corpus is formed for this study.2.To address the issue of the problem that the use of character vectors alone causes the loss of some internal information of the sentence,this paper proposes to use the distributed expression model of the sentence to train to obtain the sentence vector.Perform normalization to get the final input vector.Then,the obtained fusion vector is used as the input of the character-level Bi LSTM-CRF model,and the experiments are carried out on the basis of the corpus of judicial domain labels constructed in this paper.The experimental results have reached an overall accuracy rate of 77.08%,a recall rate of 73.69%,and an F1 value of 75.35%,which proves the effectiveness of the method in this paper.3.From practical applications,an improved Viterbi algorithm is proposed to improve the efficiency of the named entity recognition system.When solving the "Viterbi path",the "impossible path" with the lowest score is pruned after each calculation to reduce the computation and improve the efficiency of the model.Experiments show that this method has a certain effect on improving the running speed of the model.4.Due to the particularity of the names of ethnic minorities,the structure and length of the names are different from the conventional Chinese names when translated into Chinese,and this type of names cannot be accurately recognized.In this paper,a named entity recognition model with self-attention mechanism is adopted to avoid the long-term dependence problem that LSTM may appear when the time step is too large.Besides,IDCNN is used to perform local feature extraction on text characters.Then fuse the characters feature with the contextual features learned by Bi LSTM to strengthen the use of text information,and enhance the ability to recognize the translated names of ethnic minorityThe above works provide new research ideas for the identification of named entities in the field of Chinese justice,and the evaluation indicators have also been improved,which helps to promote the research of named entity recognition in the field of Chinese justice and enhance its practicality.
Keywords/Search Tags:named entity recognition, judicial field, vector fusion, conditional random field, deep neural network
PDF Full Text Request
Related items