| Textual information has been becoming increasingly rich and diverse on the Internet,but how to extract valuable information from massive unstructured text is difficult.Therefore,some methods and tools are urgently needed to complete this task.One of such an important information extraction methods is known as named entity recognition,which is to identify meaningful entity information,such as person,location,organization,etc.In recent years,named entity recognition has been widely investigated and developed,but currently Chinese named entity recognition methods based on Transformer encoder have the following problems.First,in character feature extraction,there is a problem that multi-semantic features related to character are not fully utilized,so that it is difficult for model to obtain sufficient character semantic information.Second,due to the complexity of Chinese texts,there are Chinese word segmentation errors and short or irregular sparse texts,which make it difficult for model to recognize entities.Aiming at the above problems,this thesis starts from the study of named entity recognition based on the fusion of multi-semantic features of character.The main research and innovation contents of this thesis are as follows:(1)In this thesis,we propose a dual-stream feature fusion encoding method for Chinese named entity recognition.This method effectively fuses the local and global semantic features of character,and improves the recognition effect of the Transformer encoder only use a single semantic feature of character.First,we design a dual-stream temporal network to capture the local and global features information by encoding the characters in the sentence.Meanwhile,we dynamically introduce word information to avoid word segmentation errors and enhance entity boundary division.Second,in order to reduce the noise,the gating module is used to control the information flow,and the dual-stream features are weighted and fused to obtain the contextual semantic information of the sentence.Besides,we also propose a loss calculation method that combines multiple loss functions,namely Multi-loss.Multi-loss prevents the model over-fitting by reducing the freedom of model parameters.Experiments on three Chinese datasets show that our method achieves better performance and efficiency than the most prevailing methods.(2)In the thesis,we propose a named entity recognition method based on character semantic information enhancement.The method enhances the semantic information of the character by fusing the features of similar semantic characters,thereby alleviating the difficulty of corpus recognition which contains short or irregular sparse text and lacks labeled data.First,we propose a character representation method based on the BERT model that fuses multiple character vectors to enhance the vector representation ability of character.Second,we propose a similar characters clustering matching method based on character context,which efficiently and accurately obtains the similar semantic characters vectors of each character.Finally,we construct a self-attention semantic enhancement module that filters and weights similar semantic characters matched by character according to the context of sentence,so as to effectively enhance the semantic information of character.On two representative Chinese datasets,the experimental results show that our method can effectively improve the recognition effect of the model.(3)We design and implement an open system for Chinese named entity recognition,which verifies and applies the two improved methods proposed in this thesis.The system includes three modules: Web interaction module,entity recognition module and similar word matching module.The entity recognition module can recognize three forms of text.The similar word matching module can generate entity-related words through similarity matching algorithm.Finally,the system testing proves that the system has sound functions and good compatibility.To sum up,we propose two novel named entity recognition methods by transforming the model structure,introducing external information and optimizing the loss function,in order to overcome the existing problems of named entity recognition.The effectiveness and applicability of the proposed methods are verified through a large number of experiments and practical application in this thesis. |