Font Size: a A A

Research On Medical Text Analysis And Mining Technology Based On Machine Learning

Posted on:2020-04-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:1368330575995114Subject:Information management
Abstract/Summary:PDF Full Text Request
With the introduction of information technology in the medical industry,the informationization and automation of the medical industry are constantly improving.The information processing technology of medical text is gradually becoming a new research hotspot.The medical text,represented by electronic medical records,contains a large amount of rich medical information.It is an important resource for disease prediction,personalized information recommendation,support for clinical decision-making,drug excavation,etc.,and can be used as a basis for hospital service value.measure.Despite the rich medical knowledge,it is more difficult to deal with,because the medical texts based on electronic medical records include a large amount of unstructured free text and image and image information,and the doctor's own entry may lead to spelling errors in the text,medicine.The abbreviation of nouns and the idioms of different doctors in different regions,the medical information contained in the electronic medical records can not be effectively utilized by computers.Therefore,machine learning and natural language processing related technologies will play an important role in the analysis and mining of medical texts.In order to better explore and utilize semi-structured and unstructured information in medical texts,especially electronic medical records,it is important to structure unstructured free text in electronic medical records.The medical letter has a high time sensitivity,making time information an indispensable factor for better analysis of medical texts.Traditional text categorization requires a series of pre-processing and feature engineering modeling.There are a large number of technical terms and knowledge in medical texts.Inaccurate word segmentation or incomplete understanding of semantic features will affect the correct rate of classification.The medical text is analyzed and processed,and ultimately it is necessary to generate valuable information and knowledge to provide auxiliary decision-making,such as mining the patient's medication pattern from the electronic medical record,thereby helping the doctor to give prescriptions and even provide a personalized clinical path.And according to evidence-based medicine,all processes and results are best not black boxes but can be explained,which is also a challenging and practical issue.This thesis mainly focuses on the analysis and processing technology of medical texts.Based on the in-depth study of the characteristics of medical texts and extensive analysis of related research work,this paper proposes a series of algorithms and models for knowledge extraction,modeling,classification,mining,and performance evaluation on the data set.verification.Specifically,the main research work and results of this paper include:(1)Combining with the knowledge of medical field,this paper studies a method of automatic construction of dictionary in medical field,which can extract effective medical terminology from medical text corpus.On the basis of the existing Chinese word segmentation algorithm,the medical text corpus is split and labeled,and medical new words are distinguished,medical terminology is distinguished,and the Chinese word segmentation accuracy in the medical field is further improved.Based on the accurate and effective segmentation and part-of-speech tagging of electronic medical record texts,the structural modeling of electronic medical record texts is studied from the perspective of disease progression and timeline,and the timetable rules of medical record texts in electronic medical records are studied,combined with semantic analysis.Technology,abstracting a time-based patient disease development model from the text,and implementing a structured analysis of rule-based electronic medical records.The performance impact of the constructed domain dictionary on the medical text classifier is verified by applying the constructed domain dictionary to the screening classification of different medical records.Experiments show that the construction dictionary combined with the knowledge of the medical field can better identify new medical words and improve the text classification algorithm of machine learning.Electronic medical records generally record the time and condition of a patient's illness,and this information generally exists in texts described in natural language.The mining of relevant rules is the focus of research.The general information system is difficult to analyze it in multiple dimensions.In this paper,the electronic medical record text structure model based on time information is proposed.Through the semantic analysis technology based on rule matching,the medical history and family history are automatically combined from the medical records.In the record,the patient's condition progression timeline is extracted for disease analysis and prediction.The proposed model can solve the problem that the patient information of unstructured electronic medical record content is difficult to be quantitatively analyzed,and it can be used as reference for the effective use of unstructured data of electronic medical records.(2)Drawing on the great success of deep learning technology in the field of image recognition,a character-level deep neural network model is designed for Chinese medical text classification.The two-way long-term memory and attention pooling operation layer are introduced,so that the model can be better combined with the context for classification and judgment.The model uses Google's Tensorflow framework to implement and train the parameters.Experiments show that the model has good convergence speed and accuracy,and has good performance in text classification in different subject areas.The traditional Chinese text classification scheme is usually inseparable from the preprocessing of text,such as word segmentation,feature extraction,and then combined with semantic analysis to make the computer understand the text to a certain extent.The character-level convolutional neural network proposed in this paper can directly use the character as the minimum unit for learning training.It does not need word segmentation or word-based feature extractor,and does not require knowledge of grammar or semantic structure.After training,it can directly reach the upper level.The goals are analyzed and inferred.This also overturns the previously assumed assumption that structured predictions and language models are necessary for high-level text understanding.Through this study,deep learning can deal with text comprehension problems,and without knowing any other syntactic or semantic structure related to words,phrases,sentences or any knowledge related to language.Therefore,the problem of the classification effect of the whole model is affected by the existence of a large number of professional terms and knowledge in the medical field,inaccurate word segmentation or incomplete understanding of semantic features.(3)A machine learning-based framework is proposed to mine hidden drug patterns in EMR texts.The framework systematically integrates Tanimoto similarity assessment,spectral clustering,improved LDA topic model and cross-matching between multiple features to find additional knowledge and aggregations that describe multiple perspectives hidden in highly complex drug models.The residual of the class.Through these methods,work together step by step to reveal potential drug patterns.This paper then uses the actual data from the EMR text(patients with cirrhosis ascites)from a large Chinese hospital to evaluate the method.Choosing cirrhosis ascites EMR is because patients with this disease always have many complications and require complex drug patterns.The experiment found that the framework is superior to the methods found in other drug models,especially for this disease,with subtle differences in drug treatment.The results also show that there is almost no overlap between the patterns found.Therefore,the unique characteristics of each model are well studied through the proposed framework.In contrast to other existing machine learning methods,this method effectively identifies the main drug patterns in the EMR text for highly complex diseases and mixed drug patterns;the highly mixed drug therapy is divided into different cluster drug modes rather than fuzzy Future clustering,classifying each item into a treatment model,although the similarity is weak;unlike the unsupervised deep learning-based treatment pattern discovery method,the classification method leads to each step of the framework being interpretable.Not a black box.This approach is important for understanding the clinical process knowledge discovery(because it is evidence-based and interpretable)to understand the classification process for certain drugs for clinical purposes.
Keywords/Search Tags:Medical data, electronic medical record, machine learning, neural network
PDF Full Text Request
Related items