Font Size: a A A

Research On TCM Disease Description Classification Based On Text Semantic Partition

Posted on:2019-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z FuFull Text:PDF
GTID:2394330548977443Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The Traditional Chinese Medicine(TCM)expert system has great significance to solve a series of problems such as the difficult inheritance of Chinese medicine,the lack of TCM resources and the difficulty of seeking TCM medical help.As the most basic and crucial step in TCM expert system,intelligent diagnosis is really worthy to be further researched.In this paper,we convert the problem of intelligent diagnosis to a classification problem of disease texts.First of all,we proposed a method of calculating the similarity of the disease texts based on block vector.We divided the disease texts into blocks according to the different organs they describe,and give different weights to these organs in order to distinguish between primary and secondary symptoms.By this method,we can calculate two block vectors'cosine angle to judge how many common symptoms two disease texts have.Then,we presented a TCM disease text classification model combining with the related technologies of natural language processing and data mining.Finally,we did some experiments based on the disease description text of nephrotic syndrome patients.According to these experiments,we found that the disease text classifincation model which based on block vector has better accuracy comparing with the traditional text classifincation model.The main contributions of this paper are as follows:1)We studied the traditional method of text feature extraction,text similarity calculation,and analyzed the advantages and disadvantages of each method.Then,we applied random forest model and SVM model based on text's TF-IDF feature to TCM disease text classification.The F1 score of two models are75.38%and75.20%.2)In view of disease text feature,we proposed a text feature extraction method based on block vectors that express the text sementics more precisely.3)Based on the text feature extraction method which we proposed,we presented a new method called SBBV(Similarity Based On Block Vector)to calculate texts'similarity.By comparing with the existing similarity calculation methods according to experiments,we proved that the method we proposed has better accuracy.4)Based on the disease text similarity calculation method we proposed,this paper presented a disease text classification model whose F1 score reaches 90.81%.Finally,we combined the non-textural features with textural features and presented a mixed model for disease text classification.The F1 score of mix model was nearly 1%higher than the model based on purely textural features.
Keywords/Search Tags:intelligent diagnosis, disease classification, text partition, Doc2vec
PDF Full Text Request
Related items