| Chinese named entity recognition in the judicial field is the accurate identification of various entities in adjudication documents,which is a fundamental work for subsequent applications in the field of judicial artificial intelligence.Due to the severe lack of judicial annotation corpus and the unique textual characteristics of legal texts,there are relatively few studies in this area in China.In this paper,we analyze the textual characteristics of judicial documents and combine deep learning methods for the task of recognizing Chinese named entities in the judicial domain.The main work is as follows.(1)To address the lack of public annotation corpus in the judicial domain,we construct a named entity corpus based on magisterial documents,Legal-NER,and pre-process the raw corpus from the Chinese magisterial documents website,analyze the textual characteristics of the corpus,and design a reasonable entity annotation specification.The "YEDDA" tool was developed for multiple iterations of annotation,and the quality analysis and quantity statistics of the Legal-NER corpus were finally conducted.(2)A lexically enhanced Chinese named entity recognition model is proposed.Considering the semantic expressions and word boundary information embedded in the lexicon,this paper proposes a lexical enhancement method based on an adaptive embedding paradigm on top of the word granularity named entity recognition model.The method constructs multiple word sets divided by different positions of characters in the lexicon by matching domain dictionaries,then compresses and vectorizes the word sets based on word frequency statistics,and finally uses attention mechanism to effectively integrate lexical information into word representations to achieve the purpose of lexical enhancement.The model selects BiLSTM as the encoder and inputs the CRF layer for label prediction.After experimental validation,the model improves the accuracy,completeness and F1 value compared with the word-level BiLSTM-CRF baseline model.(3)A multi-feature fusion model for Chinese named entity recognition is proposed.To address the problem of "single word representation and lack of semantic information" in Chinese text,this paper proposes a fused multi-feature word embedding representation method.Considering that pre-trained language models can learn richer semantic information and have strong generalization ability,this paper uses BERT instead of Word2vec to generate word vectors.The piny in of Chinese characters is highly correlated with their semantics and can provide additional phonetic and semantic information,so this paper proposes a compressed alphabetic representation to characterize the pronunciation information,which is fused with the word vector to further achieve pronunciation enhancement of lexical information.Considering that the five strokes can reflect the character structure features of Chinese characters,this paper uses CNN to process the five-stroke coding sequences of Chinese characters and extract the five-stroke character features of Chinese characters,which effectively enriches the information representation of Chinese characters.The BERT word vector,pronunciation-enhanced word vector and five-stroke glyph vector are spliced and fused as the vector representation layer,and the subsequent structure uses the BiLSTM+CRF model to realize the contextual feature extraction and label decoding functions.In the experimental part,the effectiveness of each module is verified by ablation experiments,and the improvement in F1 value compared to the BiLSTM+CRF baseline model is 6.67%,and the accuracy rate,completeness rate and F1 value are 90.60%,91.77%and 91.18%,respectively. |