Font Size: a A A

The Research And Application Of Data Mining For Traditional Chinese Medical Records

Posted on:2017-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:K LiangFull Text:PDF
GTID:2308330488952164Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Medical records of Traditional Chinese Medicine (TCM) are important knowledge resources, they contain rich clinical experience, and such experience and knowledge is recorded and spread mostly in the form of literature. Researching TCM text records and studying and organizing the text information of TCM medical records provide effective data for TCM knowledge discovery.The main task of this paper is targeted processing of information in medical records through information classification and extraction by computer. Finally, the processing results will be stored in a database or file in a structured form and the structured results will be mined for medication laws of traditional Chinese medicine (TCM). The methods used in this paper include Chinese word segmentation, conditional random fields (CRFs)-based named entity recognition (NER) and the Apriori algorithm of data mining. Specifically, the task is carried out from the following aspects:(1) Use Chinese word segmentation technology to conduct semi-structured processing of text medical records to provide basic support for the construction of TCM knowledge acquisition information platform.(2) Use CRFs to recognize named entities of TCM.(3) From the perspective of research on TCM texts, use Apriori algorithm to mine data so as to help medical researchers automatically acquire knowledge and law from TCM medical records.(4) Introduce relevant programmed software to provide an available auxiliary tool for TCM scholars’further study.Professor Ding Shuwen and Professor Chen Shouqiang’s medical records are the text objects of the paper. The Chinese word segmentation method ICTCLAS is used for word segmentation and the improved Apriori algorithm is used to mine texts for data with the open source software CRF++0.58 kit as the realization of CRFs model. Finally, the following conclusions are drawn:(1) The accuracy of word segmentation can be greatly improved with the most commonly used TCM terms as seeds;(2) Satisfactory recognition results of symptom, syndrome type, therapeutic principle, drug name and other terms can be achieved if NER is conducted with appropriate feature templates;(3) Relatively satisfactory mining results can be achieved by mining processing results for data and constantly revising word segmentation and recognition process from the perspective of practical application.
Keywords/Search Tags:Chinese word segmentation, CRFs, Apriori algorithm, TCM terms
PDF Full Text Request
Related items