Font Size: a A A

Relation Extraction Of Traditional Chinese Medicine Prescription And Disease Study Based On Literature Abstracts Data

Posted on:2019-10-21Degree:MasterType:Thesis
Country:ChinaCandidate:X H YangFull Text:PDF
GTID:2428330545483288Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Objective Using the traditional Chinese medicine(TCM)type literature abstract data from CNKI journals database,combining with natural language processing,machine learning and deep learning,this paper studies the relation extraction between TCM prescription and disease,and visualizes the results of the relation extraction.Methods In this paper,we propose to use web crawler technology to obtain automatically the CNKI literature abstract data.At first,we perform data preprocessing steps such as data cleaning,dictionary construction,and word segmentation.Then we use two of the most popular method to study the relation extraction between traditional Chinese medicine prescription and disease.The first method is to extract the features of the cleaned-up literature abstract data and construct the SVM classification model.The second method uses Word2 Vec library to train word vectors,and uses the LSTM model to extract the relation,without the need to extract features manually.Due to the large amount of data in the literature abstract,in order to improve the efficiency of data processing,we explore the use of Spark distributed computing platform to perform computing tasks.At the last,the NOSQL database is used for saving the relation extract result sets of traditional Chinese medicine prescriptions and diseases.Moreover,we develop the web application system,which uses the JSON data format to interact with the front pages and backstage server.The visual display system use the D3.js library for dynamic display.Results A total of 1073581 abstracts were obtained from all the Chinese medicine-related abstracts data of CNKI over the past 66 years using Web crawler technology.In addition,all of these abstracts are located in the "Medical and Health Technologies"-"TCM" category under the "Literature Classification".According to the dictionaries of traditional Chinese medicine prescription and disease,204,780 sentences containing both prescriptions and diseases of traditional Chinese medicine were filtrated.The accuracy of the SVM classification model constructed by the first method is 87%.The second method combines the trained word vector of Word2 Vec and the constructed LSTM model,the accuracy of which keeps between 85% ~ 87.5%.The accuracy of the SVM and LSTM model is almost the same.In the first method,Spark's distributed computing platform was used to perform computational NLP tasks,which significantly increased the speed of the process.The relation extraction result of traditional Chinese medicine prescription and disease are stored in MongoDB non-relational database,with D3.js,combined with the Spring Boot background server and Vue.js front-end framework for the production of visual display system.The visual display system can dynamically display and query the results of relation extraction in the browser.Conclusion Using machine learning method and deep learning method to perform the relation extraction tasks between the traditional Chinese medicine prescription and disease,its accuracy rate is high.The relation extraction results will play a positive role in promoting the research on Chinese medicine prescription treatment of disease.At the same time,Chinese medicine prescriptions and disease relationship extraction visual display system will help Chinese medicine researchers to Chinese medicine prescription and disease rapid retrieval.
Keywords/Search Tags:Relation extraction of traditional Chinese medicine prescription and disease, relation extraction, natural language processing
PDF Full Text Request
Related items