Font Size: a A A

Research On Text Classification For Chinese Medicine Knowledge

Posted on:2022-09-25Degree:MasterType:Thesis
Country:ChinaCandidate:P WangFull Text:PDF
GTID:2504306575483144Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Traditional Chinese Medicine(TCM)has been passed down from generation to generation,demonstrating its unfading charms.However,it is still faced with great difficulties in its inheritance.The advancement of science and technology has pointed out a new direction for TCM,one in which the modernization,informatization,digitization,and techonologicalization of TCM theories are as the key topics in the intelligence-centered dialectic discussions in clinical TCM texts.The process of dialectic discussion and disease treatment determination in TCM,i.e.,the process of identifying the cause and pathogenesis,followed by determining the treatment principle and specific treatments before formulating prescriptions and medicines,can be abstracted as the mapping and classification process of clinical TCM texts wherein text classification techniques are applied.TCM books and the Internet provide a wealth of data resources for intelligence-centered dialectic discussions of TCM.However,most of them appear as data in structured,semi-structured or unstructured forms,not to mention that a major proportion of ancient Chinese texts containing massive TCM terminologies with rigorous expressions full of dialectical thinking and high contextual relevance.These pose great challenges for the intelligent analysis and classification of TCM medical records.In view of this background,the research work is divided into the following three aspects:1)Firstly,the research targets such problems as the lack of public data sets in the field of TCM,the noise contained in TCM texts,as well as the difficulty in obtaining highprecision data sets.Conducted rectifications were the use of optical character recognition to extract Chinese medicine books,as well as the adoption of web-crawlers to access the Internet data before preprocessing them via data cleaning,word segmentation and removal of stop words.Then,based on the characteristics of TCM language,the keyword matching algorithm was modified and reconstructed based on the TCM dictionary.The algorithm was based on a single keyword,multiple keywords,or the TCM dictionary,and important information such as etiology,pathogenesis,treatment rules and prescriptions,etc.,were extracted.At the same time,the annotation of TCM clinical corpus and other tasks were accomplished,based on which an accurate data set in the field of TCM were constructed.2)Secondly,the research targets the problem where traditional text classification algorithms have a very complicated feature engineering system with highly sparse texts and inability to capture the contextual meanings of TCM texts.This research was able to apply six deep learning text classification algorithms,including Convolutional Neural Network(CNN)and Recurrent Neural Network(RNN)to the classification of TCM clinical texts,which not only avoided a large proportion of feature engineering work by solving the problem in an end-to-end manner,but also achieving the purpose of deepening the semantics extraction of TCM texts by the use of deep-learning machines.Experiments have proven the satisfactory results,with CNN model demonstrating the best performance in terms of classification function.3)With regards to the lack of feature importance in CNN model and the difficulty to evaluate the importance of each feature,let alone integrating TCM semantics on a deeper level,this research was able to construct a TCM classification model containing TCM semantics.This new model has realized the in-depth and all-round semantic feature representation based on the use of slightly-modified Bert model for TCM based on TCM corpus,in addition to using the CNN model to complete the extraction and classification of important features in clinical TCM texts.Comparative experiment have shown that the model is able to achieve the optimum classification outcome for clinical TCM corpora.Figure 27;Table 14;Reference 54...
Keywords/Search Tags:knowledge of traditional chinese medicine, text classification, deep learning, natural language processing
PDF Full Text Request
Related items