Font Size: a A A

Research On Key Algorithms Of Electronic Health Records Text Mining

Posted on:2015-02-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:W LiFull Text:PDF
GTID:1108330482455679Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of medical technology and application of healthcare IT sys-tems (EHR/HIS/PACS/LIS, etc.), massive, distributed and heterogeneous medical data are generated. Meanwhile, medical practices increasingly rely on data analytic. The clinical diag-nosis is being changed from qualitative judgments to quantitative analytics with more medical devices and high data accuracy. The electronic health records (EHR for short) contents rich information about clinical diagnosis and treatments. It is import for clinical efficiency and quality to carry out the text mining on these data. This paper focuses on developing some key algorithms of data acquisition, information extraction, pattern mining and data classification on large-scale EHR data to find valuable medical rules or models, which supports intelligen-t clinical decision support system development. The four innovative works are described as following.Firstly, the EHR data quality is affected by data storage with no uniform structure, some data items missing, the value non-standard of data items, synonymous or negative expression leading to information extraction difficulties and other issues. This paper proposes a metadata based records cleaning algorithm. This method defines a metadata database which includes technical metadata for data extracting and loading, business metadata for data standardize and conversion. Then a data adapter model is implemented to achieve automatically medical data online extraction, conversion and storage. There are large amounts of unstructured text after the medical records cleaning, a rules and conditions random field based medical named entity recognition algorithms is developed to extract the medical knowledge and providing structured medical data for subsequent research.Secondly, the disease names in EHR are not unified, non-standard which are bothered by disease synonymous, abbreviated writing, doctors personalized writing habit. Therefore, the electronic health records text data lacks of an unified naming standard for diseases taxonomy that medical data analysis study seriously is affected. This paper proposes an automatic dis-ease taxonomy construction algorithm based on the short text corpus. First, various diseases diagnostic texts are obtained in medical records. An adaptive text clustering method is used to identify disease synonymous text. Then a hierarchical clustering method is applied to build the hierarchical disease concept. With the feature of disease text, a fast set based similarity measure approach is developed for short text. Experimental results show that the method can quickly and accurately identify the disease synonymous and hierarchical concepts construction.Thirdly, the exist research work assumes various factors are independent to each other, that lacks of consideration of the relationship between data items in medical diagnosis and treatmen-t pattern mining tasks. This paper describes a general multi-level multi-taxonomic relational pattern mining algorithm that can be adapted to four taxonomies that are generalization, ag-gregation, combination and dependency. The taxonomy can effectively removes the redundant pattern to improve the effectiveness of the mining results. Meanwhile, the algorithm provides a new data structure of multi-level graph and multi-level traversal method, that combines the transactional data and the taxonomy data together avoiding multiple scans of the database.Finally, artificial neural network is widely used in disease classification. However, the artificial neural network is short at the amount of training data which will make the training process very long. The model is not easy expanded with new training data. The more train-ing data, the more training time is used. Association classification method can be quickly built based on classification rules, but the exist study work don’t take the coupling relationship between the classification rules in consideration, and there also lacks a unified classification discriminant model. This paper propose a new association classification algorithm based on artificial neural network, which enables quick structure of the neural network built and param-eter settings. At the same time, the method providing a common, quantitative description of the decision-making model for associated classification method.In order to validate of the above algorithms, we applied these algorithms into two clinical decision support systems which are EHR mining system and EHR semantic retrieval system. The practice shows that the algorithms described above meet the clinical decision support re-quirements.
Keywords/Search Tags:Named entity recognition, concept hierarchical clustering, association rules classification, multi-level pattern mining, artificial neural network
PDF Full Text Request
Related items