Font Size: a A A

The Key Techniques Research On Text Mining Of Traditional Chinese Medicine

Posted on:2017-03-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:F YuanFull Text:PDF
GTID:1108330482993382Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Traditional Chinese medicine(TCM) is a summary of rich diagnosis and treatment experience formed by Chinese working people over thousands of years’ struggling with diseases, a unique diagnosis and treatment system with yin-yang and the five elements as the theoretical basis has been formed and a large number of documents with guiding value for TCM clinical decisions have been left in the long course of development, these "massive" TCM medical records and documents are valuable resources for clinical diagnosis and treatment of TCM. Currently, using different text mining methods to get understandable and usable knowledge from "massive" documents, which will be used to analyze the medication laws of TCM diagnosis and treatment and guide clinic, research, teaching and development of new drugs of TCM, has increasingly become a hot research topic in the field. However, the text of TCM medical records has not been effectively tapped and used. The reasons are that there are still some difficulties in building a unified TCM medical records ontology library. Also, the efficiency of named entity recognition(NER) is low, the text vector space model ignores the correlation between words and it cannot well indicate the potential semantic information. Meanwhile, traditional text clustering algorithms depend too much on the initial value and tend to get local optimum when processing data.In response to these problems, based on previous studies, an ontology-based named entity recognition algorithm and a traditional Chinese medical case text clustering method based on firefly algorithm are put forward. The research in the thesis is supported by Shandong Science and Technology Development Plan: Design and Implementation of Document Data Retrieval and Mining Algorithm Based on the Semantics of Medical Enzymes(No.2010G0020121), Shandong’s specific electronic project Development and promotion of Diagnosis, Treatment and Decision Support System for Shandong Famous and Veteran TCM Doctors(No. 2150511) and Shandong TCM Science and Technology Development Plan Research on Bionic Intelligent Algorithm-Based Comprehensive Heart Failure Control Schemes(No. 2013-230).The data of this thesis is from 2,400 medical records of Professor Ding Shuwen, a famous and veteran TCM doctor of China and Shandong Province, which are made between June 2013 and June 2015 in old patients of the Affiliated Hospital of Shandong University of Traditional Chinese Medicine. The medical records include a total number of 757, and a total number of 251 Chinese herbal medicines.The main contents and results of this research are summarized as follows:1. Artificial bee colony algorithm is applied to build traditional Chinese medical case ontology library. Based on artificial bee colony algorithm, the ontology learning technology is used, diagnosis by looking, listening, asking and feeling the pulse, TCM diagnosis, western diagnosis, symptom and therapeutic method in medical records are analyzed and verified as information corpus. The concept extraction method is designed by Chinese word segmentation technology, mutual information, regulated filtration and other strategies. Meanwhile, fusion and evolutionary algorithms of niche technology are used to enrich the diversity of population, which combined with the advantages of high search speed of artificial bee colony algorithm, extract non-taxonomic relations. Finally, it constructs an ontology library. Experiments show that artificial bee colony algorithm combinations are better than ordinary artificial bee colony algorithm in terms of both individual diversity and average fitness in the extraction of non-taxonomic relations in TCM medical records.2. An ontology-based NER method for TCM medical records is proposed. Conditional random fields and a correction method based on modification and feature template of ontology library are used to construct an ontology-based named entity recognition algorithm for medical records of TCM. The optimal results of diagnosis are found through verification tests by looking, listening, asking and feeling the pulse, TCM diagnosis, western diagnosis, symptom and therapeutic method. Experiments show that better results can be achieved when NER for medical records of TCM is conducted with the ontology-based named entity recognition algorithm.3. A vector space model of traditional Chinese medical case is designed based on corresponding word combinations. Second-order word co-occurrence combinations in TCM medical records are extracted by association rules algorithms. The measurement methods of word co-occurrence are defined and a word co-occurrence combinations-based vector space model is constructed. Experiments show that the method has higher distinguishing power than classical vector space models in terms of knowledge acquisition and classification of TCM medical records and the correlation between syndrome-based diagnosis and treatment and second-order word co-occurrence in TCM medical records is verified.4. A firefly algorithm-based text clustering algorithm for TCM medical records is proposed. The thought of granular computing is introduced into the algorithm, which dynamically determines the iteration of firefly algorithm and the sampling of simulated annealing algorithm based on changes in fitness, increases the choices of population by expanding the disturbance of simulated annealing and tests and verifies experimental data. Experiments show that, compared with the traditional clustering methods, this method has good performance in individual diversity and it can solve the difficulties in obtaining global optimum. Text clustering results have been accepted by experts and they have some clinical reference value.To sum up, in the thesis, several key technologies of text mining of TCM medical records are analyzed, algorithms which are suitable for text mining of TCM medical records are designed and the algorithms are integrated and verified through the text mining system. Experiments show that the design proposals in the paper are efficient and advanced that they can provide a reference for clinic, research, teaching and development of new drugs of TCM.
Keywords/Search Tags:ontology learning, named entity recognition, text vector model, text clustering, Traditional Chinese Medicine medical records
PDF Full Text Request
Related items