| In recent years,with the improvement of the informatization of police affairs,the text content of case have shown a trend of geometric growth,and mining important information from massive texts has become an urgent problem to be solved.The named entity recognition of cases aims to extract structured data from unstructured texts,which is conducive to the construction of police knowledge map and police question answering system,and has great significance for the comprehensive realization of the modernization of national governance capabilities.Named entity recognition shown great advantage for the development of downstream tasks of policing case.However,the government has not paid much attention to the named entity recognition,resulting in it’s still in low level.The main contents of this paper are as follows:(1)As the lack of a standard labeling data in police cases,this paper performs data cleaning and BIO labeling on 2576 cases provided by the Hunan Provincial Public Security Bureau,and constructed a standardized case named entity recognition dataset.(2)In result of the redundant information in the character vector processed by the pre-training model which named Ro BERTa,which slow down the convergence of model parameters and hinder the baseline model Bi LSTM-CRF extracting finegrained features in character vectors.To this end,a convolutional neural network based model in cases is proposed.The model effectively represents the characters of the police affair data through the improved character vector generation method,And extracts the local fine-grained features of the character vectors.the reasonably convolutional neural network layer not only reduced the cost of the character vector dimension,but also solves the problem of lengthy character vector.The reduction of parameters in model leads to a significant increase in the rate of convergence of the overall parameters of the model.In order to make up for the defects of the onedimensional convolution layer in the extraction of character sequence context features and dependencies,the Bi LSTM(Bidirectional Long-Short Term Memory)layer is introduced into the model,and finally the CRF(Conditional Random Field)layer is used to constrain the output of text sequence labels.(3)Aiming at the problem of dense distribution of entities,nested entities and weak recognition ability of the model in the case text,a named entity recognition technology based on the Multi-Head Self-Attention mechanism is proposed,namely BM-Bi LSTM-CRF.This method uses the BERT(Bidirectional Encoder Representation from Transformers)to enhance the semantic representation of the training data and dynamically generates word vectors according to the contextual features.Through multi-head self-attention and Bi LSTM,the contextual features,dependencies and semantic associations of character sequences are analyzed.The features are accurately captured,and the label layer is constrained by the CRF module to obtain the label sequence of the input text.(4)Using the case named entity recognition model proposed in this paper,we develop a web application based on the Flask framework.The case named entity recognition application is designed with B/S(Browser/Server)architecture,and supports online extraction of entities from policing data,which can assist police officers in analyzing police information and provide technical support for the construction of downstream tasks. |