Font Size: a A A

Research On Named Entity Recognition And Normalization For Chinese Biomedical Texts

Posted on:2022-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z R PengFull Text:PDF
GTID:2514306752997499Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Named entity recognition and normalization of Chinese biomedical texts play an important role in the downstream tasks of information extraction and the construction of Chinese medical knowledge graphs.With the rapid accumulation of medical literature and the widespread use of electronic medical records,the mining of useful information from the vast amount of medical data and its further analysis relies on entity recognition and normalization techniques.However,the structure of Chinese texts is often more complex than that of English texts and the scheme of word separation is more complicated than that of English.The problem of transmission of semantic errors due to word misclassification is difficult to be solved.While non-medical information extraction techniques work better than medical ones,which makes it more difficult to extract information from Chinese biomedical texts.Based on this background,this paper investigates named entity recognition and normalization techniques in information extraction for Chinese biomedical texts,and provides technical support for downstream tasks in information extraction.The main work of this paper is as follows.1.A named entity recognition model for Chinese biomedical text,called BertBAC is proposed.The Bert model based on character embedding to obtains the total feature vector of the text and adds an attention mechanism before the output layer of a typical neural network structure to obtain the long-range dependencies of tags on other words in the sentence.Experiments show that the recognition accuracy of the BertBAC model is better than the existing baseline method using conditional random fields,which can perceive the semantic relationship information of multiple granularities in the text and achieve the recognition of five types of Chinese medical entities.2.This paper proposes a Siamese network and Bert model combined entity normalization method,called Siamese-Bert.The method focuses on solving the two problems of candidate entity generation and candidate entity ranking in the normalization task,which constructing a Siamese text similarity calculation network to generate the set of candidate entities,and combining the Bert model to score and rank the set of candidate entities.The current entity mentions the corresponding standard entity is output.Finally,the effectiveness of the Siamese-Bert model on the task of Chinese medical entity normalization is experimentally demonstrated.3.This paper combines the BertBAC entity recognition model and the Siamese-Bert entity normalization model to design and implement a deep learning-based named entity recognition and normalization system.The requirements of the system are analyzed and the general architecture of the system is designed,the implementation and working principle of each module are introduced in detail and the main functions of the system are demonstrated and introduced.Finally,the feasibility of the system is proved by system testing.
Keywords/Search Tags:Named Entity Recognition, Named Entity Normalization, Candidate Entity Generation, Candidate Entity Ranking
PDF Full Text Request
Related items