Font Size: a A A

Research On Algorithm And System Implementation On Named Entity Recognition For Chinese Electronic Medical Records

Posted on:2019-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:S S SunFull Text:PDF
GTID:2348330542991642Subject:Information management
Abstract/Summary:PDF Full Text Request
In recent years,people continue to dig deeper into medical research.At the same time,the number of hospitals is increasing,resulting in large quantities of medical information.The use of Natural Language Processing technology to deal with the data of electronic medical records is an important trend in the field of medical research.The technology of extracting information content,retrieving information and establishing question answering system all need the support of named entity recognition technology.Therefore,in order to solve the problem of named entity recognition in the medical records of text recognition efficiency is not high,the performance is not good,this paper aims to study the recognition algorithm a better performance,is developed based on the algorithm to achieve a medical disease name,text recognition means for the treatment of clinical symptoms,such as named entity information system.In this paper,the general situation of the research background and development of named entity recognition carried on the thorough investigation and research,on the basis of named entity recognition of three kinds of commonly used methods for study,analysis the advantages and disadvantages of three kinds of algorithms.According to the research analysis,the rule-based approach is mainly based on the establishment of various rules by experts,and named entity recognition based on this.The method based on rules need professional experts in the field of custom rules based on the characteristics of the text,to participate in the specialized personnel request is higher,and human cost and time,poor portability and flexibility ways at the same time.The method based on dictionary matching recognition,mainly through the dictionary and word sequence has the very high recognition accuracy,but has a high requirement about the quality of the dictionary,it is difficult to identify the unknown words from do not exist in the dictionary.Conditional Random field model(Conditional Random Fields,CRFs)has the characteristics of the independence of the strong maximum entropy algorithm,but also has hidden markov model to identify the characteristics of high performance,can effectively avoid the tag in the maximum entropy model bias and hidden markov model identification of complex named entities is difficult problem,has a good recognition performance,but it is limited by the size of the training set and the selection of features.Based on the characteristics of the airport model,this paper puts forward a hybrid model based on the combination of dictionary and condition with airport model.On the one hand,using the method based on dictionary to mark the training corpus,and will get the results as the CRF model of training corpus,this aims to manual annotation under the circumstances of less data,still can be more than enough to CRF model training,on the other hand is to put the dictionary,in the form of feature is introduced into the conditional random field model of learning.In this paper,four groups was implemented to identify,through the comparison and analysis of the experiments found that add entity dictionary in conditional random field model can effectively improve the efficiency of model recognition,improve the performance of named entity recognition system.At the same time,the experiment proves that the mixed model proposed in this paper has good recognition efficiency.In addition,through investigation,it is found that the research of named entity recognition for EMR focuses on the field of algorithm.The information system specially used to identify the entity of electronic medical record is very few.It is mainly the form of some software packages,which is difficult to be used directly.Based on this situation,we design and implement an interface friendly Chinese electronic medical record named entity recognition system supporting Java language.The system uses the hybrid statistical model proposed in this paper as the core algorithm to identify named entities.Through unit test and integrated test,it is found that the dictionary management function and the named entity recognition function in the system are all expected,and the system is well realized.
Keywords/Search Tags:Entity dictionary, Machine learning, Named entity recognition, Conditional Random Fields
PDF Full Text Request
Related items