| Electronic Medical Record(EMR)runs through the whole medical activity of patients,contains the diagnosis and treatment information of patients,and plays a key role in doctors’ diagnosis and decision-making.However,the early accumulation of a large number of unstructured Chinese electronic medical records(XML format)has brought obstacles to doctors’ medical records retrieval and scientific research,how to quickly and accurately retrieve useful information from electronic medical records has become more and more urgent.Therefore,this paper studies the use of Elasticsearch(real time search engine,ES)search engine tool to build electronic medical record full text retrieval system,so as to improve the efficiency of electronic medical record retrieval.The research of this paper mainly includes the following three aspects:(1)This paper proposes an improved algorithm for Chinese new words discovery based on mutual information and left-right information entropy for electronic medical record search engine.Based on the text characteristics of Chinese electronic medical records and the morphology of medical terms,this paper mainly improves the preprocessing and the algorithm structure.Pretreatment,this article is based on Chinese medical thesauri and diseases ICD-10 code to build a medical dictionary,also with reference to electronic medical records in the text word medical features selection of stop words used to update the front stop list of word segmentation,is used to improve the effect of pre segmentation of Chinese new words discovery algorithm,to find more new words.In terms of structure,the calculation of point mutual information in the algorithm was changed to the average point mutual information,the calculation of left and right information entropy was split,and finally the results of the two branches were combined and the intersection was obtained.The experimental results show that the improved algorithm proposed in this paper is better than the previous algorithm in finding new words.(2)This paper proposes a search result ranking algorithm based on Adarank for electronic medical record search engine.The traditional retrieval model needs to rely on manual setting of the sorting formula,and constantly optimize the sorting parameters in the iterative process,manual debugging parameter workload is large.In recent years,the sorting learning algorithm using machine learning sorting model has been widely applied in various fields.In the field of electronic medical records,there are very few studies using sort learning.Therefore,this paper applies Adarank algorithm in sort learning to electronic medical records to optimize search engine ranking results.This paper manually annotated cardiovascular disease electronic medical record documents,selected keywords to annotate each document with document-query words,and finally used the traditional retrieval model BM25 to learn ranking Rank Net,Lambda Rank,List Net and Lambda MART to carry out a comparative experiment.The experiment showed that,Compared with the traditional BM25 algorithm and the other four sorting learning algorithms,the algorithm proposed in this paper has a better sorting optimization effect on the search results of electronic medical records.(3)Design and implementation of electronic medical record search engine system.Based on the studies in(1)and(2),the medical record search system constructed in this paper not only has the full text search function of electronic medical record,but also provides the user management function,the new word discovery function,and the electronic medical record view function.The application of full text retrieval function of electronic medical record makes the full text retrieval of electronic medical record more convenient and fast. |