| With the rapid development of computer network,a large number of complex data are produced in all walks of life,and the amount of data generated by commercial field due to its timeliness,particularity,sensitivity and other characteristics can not be underestimated.’In the production of a large amount of data at the same time,due to the lack of privacy protection related laws and awareness and illegal theft and other problems,resulting in the current environment of sensitive information in the business field frequent.Because of the complexity of information in the commercial field,it is difficult to identify and protect sensitive information manually.However,BERT model can combine the context information of the text and solve the problem of polysemy in Chinese expression,which not only effectively saves labor cost but also completes the recognition task with shorter time and better effect.Therefore,the research on sensitive information identification method based on BERT model has important research value and practical significance.First,in view of the topic background and research significance,named entity recognition technique can identify from information carrier of may contain keywords to extract specific entity type,and because the Chinese language on the discontinuity,grammatical features and English are very different,so the insight into the current Chinese named entity recognition,and the development trend in the field of mainstream models to solve such problem,At the same time,the bottleneck and difficulties of traditional methods are discussed,and the improved methods and ideas are put forward.Secondly,it defines the sensitive information commonly seen in the business field from the legal level,studies the categories and forms of sensitive information,discusses the necessity of sensitive information perception in detail,and analyzes the influencing factors of sensitive information on the whole text.Due to the particularity of Chinese grammar,polysemy,difficult semantic extraction and unstructured text are some problems in the process of recognition.Aiming at such problems,this paper constructs an overall sensitive information recognition framework from the bottom of data type,through the process of data collection and data preprocessing,and by using named entity recognition.Finally,BERT pretraining model is introduced based on the advantages and disadvantages of traditional models.It can effectively solve the problem that traditional word embedding model cannot express polysemy.BiLSTM,CRF and SPAN decoding models are used to make up for the defect that BERT model can only capture context features,and the BERT output results can be modified to improve the overall recognition accuracy.Therefore,BertBil STM +CRF model,Bert-CRF model and Bert-SPAN model are formed.On this basis,the bert-CRF and Bert-SPAN model were fused to realize the Bert-CRF + Bert-SPAN model.According to the experimental results,the overall model recognition effect designed in this paper is excellent,which proves that the sensitive information recognition model proposed in this paper based on BERT model is of great significance. |