Font Size: a A A

Research On Key Technologies Of Medical Text Mining

Posted on:2020-02-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:X L MaoFull Text:PDF
GTID:1484306548991459Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
At present,medical data is growing rapidly.The medical text data,medical examination data and image data from many hospitals in regional medical fields are rapidly gathering,which brings dawn to the screening,diagnosis and treatment of diseases.Medical text data,especially electronic medical record data,records the patient's detailed condition and treatment process,and contains rich information.Therefore,to carry out the research on key technologies in medical text mining,extract features related to a disease and build the corresponding knowledge graph,can more scientifically and objectively prevent and screen the disease and give a reasonable treatment plan,and finally provide better medical services for the majority of patients,which has good theoretical and practical significance.To this end,based on the text data of electronic medical records,this dissertation uses natural language processing,machine learning,data mining and other technologies to carry out medical text preprocessing,knowledge mining and knowledge mapping construction.The main works and innovations of this dissertation can be summarized as the following aspects:(1)Aiming at the recognization of medical clinical terms,this dissertation proposes a clinical term extraction algorithm based on custom dictionary and random combination method to improve the accuracy of clinical term recognization.Compared with the substantive participle of the news field,the recognization of medical clinical terms has the following problems: the entities in the news field are generally proper nouns,and there are few qualifiers in front of them,which can be better recognized based on the general word segmentation tools.Medical entities are often common nouns,and many qualifiers can appear in front of them.The same words or phrases can represent different categories of medically named entities,and the meaning can only be inferred according to the context.Therefore,clinical terminology recognization in the medical field is more challenging.In response to this problem,this dissertation first uses the relevant literature in the Chinese Biomedical Literature Database(Sino Med)to obtain a custom dictionary;then combines the Jieba word segmentation tool to segment words;finally,the words after segmentation are combined with the left and right of adjacent words to further judge new terms.The method enriches the description of clinical terms by correlating the context information of adjacent words,and shows good results in clinical term extraction.(2)Aiming at the preprocessing of medical texts,this dissertation studies three preprocessing methods corresponding to three common preprocessing application scenarios,in order to improve the quality of text datasets.For the case of unbalanced samples,this dissertation proposes an unbalanced dataset processing method based on improved Smote algorithm,which improves the process of generating new samples and the sample distance measurement method,and validates effectiveness of the improved method in this dissertation based on two different experimental data sets.For the structured processing of the sample,the main steps of sample structuring were studied based on the dataset of a famous top three hospital.For the missing value processing of samples,we mainly studied the multiple difference compensation method and its actual effect.Although there is a certain error between the "complete" data set after interpolation and the real value,the error has been greatly improved compared to the original missing data.(3)Aiming at the association rule mining and treatment scheme mining of electronic medical records,this dissertationr proposes an improved CPAR algorithm and a mining algorithm based on latent semantic analysis.In the aspect of association rule mining,the improved CPAR algorithm introduces Enhancement Ratio when generating the association rules,so that the acquisition of the rules not only considers the support of this class,but also considers the support of its complement.If the proportion of the positive tuple is large,it is advantageous to obtain the classification association rule of the relevant negative tuple.In terms of treatment plan mining,this dissertation proposes a semantic relationship based on latent semantic analysis technology to mine clinical terms and treatment plans for gastropathy.Firstly,we construct the co-occurrence matrix of clinical terms and treatment plans,and then construct the potential semantic space,and finally obtain the correlation between the clinical feature combination and each treatment plan,such as the correlation between clinical features of gastropathy and surgical treatment,the correlation of surgical chemotherapy,the symptomatic treatment,etc.In addition,the association rules and clinical features of the excavation are visualized according to different treatment plans,which is convenient for medical workers to find and quickly understand related diseases.(4)Aiming at the construction of disease knowledge graph and intelligent question and answer,this dissertation proposes a method of disease knowledge map construction based on multi-entity relationship,and based on the map to achieve simple intelligence question and answer of disease.Taking gastropathy as an example,firstly,based on the multi-entity relationship,we construct the relationship between the gastric disease entity and symptomatic treatment characteristics,chemotherapy characteristics,surgical treatment characteristics,surgical chemotherapy characteristics,as well as the relationship between the characteristic entities and treatment plans,and the knowledge graph of gastropathy is formed.Then,based on the graph,an intelligent question and answer is implemented.The intelligent question and answer uses the gastropathic question data of the good doctor website to simulate the patient's question data.First,the patient's question data is processed through a series of processes to find out the disease information;then,the similarity between the disease information and the disease information in the gastropathy knowledge graph is calculated;next,according to the gastropathy knowledge map,we obtain the possible disease types,and finally give treatment advice.In order to achieve the above algorithm validation of practical utility,this dissertation studied the three diseases of breast cancer,gastropathy and glaucoma respectively,and applied the above algorithms to SEER dataset,glaucoma dataset,gastropathy dataset and network dataset of the good doctor's website for knowledge mining.Practice has proved that the above algorithms have certain feasibility and effectiveness.Meanwhile,the theory and method of project research can also be extended to other diseases,with certain scalability,and it is expected to be further applied to clinical decision-making.
Keywords/Search Tags:Medical text, Clinical terminology identification, Association rules mining, Text visualization, Knowledge map
PDF Full Text Request
Related items