Font Size: a A A

Researches On Topic Recognition Of Online Patient Reviews Based On Dynamic Mixed Sampling And Transfer Learning

Posted on:2021-11-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y T XieFull Text:PDF
GTID:2494306104992019Subject:Health information management
Abstract/Summary:PDF Full Text Request
[Purpose] Current researches on online patient reviews focused on the discussion and analysis employed descriptive statistics.A few researches involved with mining and analyzing texts of online patient reviews,mainly used LDA topic model or other clustering algorithms to extract the topic.Those are both unsupervised learning models,and have some defects,such as strong subjectivity,uncontrolled topic,etc.Based on this,a topic recognition model for content extraction of patient reviews was built with the framework of convolutional neural network.Then,large-scale patient reviews data was collected from the empirical perspective,based on which the issues that patients paid attention to were extracted and analyzed considering of different aspects like hospital level,specialty,patients satisfy and feedback delay.Therefore,the key problems patients focused on would be found,which could guide the medical selection of other community users and provide the advice for the improvement of medical cares.[Methods] The method of the study could be summarized as:(1)Experimental data collection.Self-programming method was used to design and develop the collection strategy and storage structure of data.Then the patient reviews data were downloaded regularly from guahao.com.(2)Knowledge base build.The model of word vector was trained by Word2 Vec algorithm on large-scale patient reviews texts,and the patient topics corpus was annotated by trained coders.(3)Learning of imbalance data.The dynamic mixed sampling technology was proposed,combined with the thought of transfer learning,to improve the learning problem of imbalanced data after the conversion of multi-label data.(4)Topic model training.The framework of convolutional neural network was used as the basic model for content extraction of patient reviews.Then the model was trained with the learning strategy of imbalanced data proposed by our study.(5)Empirical analysis.The trained model was used to extract the topic of large-scale patient reviews data.Then the fields of the hospital level,specialty,satisfy tendency and feedback delay were cleaned according to the index dictionary.Finally,the analysis and discussion were applied to the patient reviews from the angles of those fields.[Results] The results of this study could be summarized as:(1)Model training stage.Compared with model trained by SVM,the model trained by CNN achieved better accuracy,recall and F1 value in most tasks of topic recognition.Compare with model trained by CNN,the model trained by DMS+CNN could significantly improve the recall in all tasks of topic recognition.Compare with model trained by CNN,the model trained by TL+CNN could significantly improve the precision in all tasks of topic recognition.Compare with other models,the model trained by DMS+TL+CNN could significantly improve the F1 value in all tasks of topic recognition.(2)Empirical analysis stage.The overall distribution proportion of topics of patient reviews,from more to less,were attitude theme,measure theme,ability theme,effect theme,environment theme and cost theme.From the different aspects of hospital level,specialty,patients satisfy and feedback delay,the distribution proportion of those topics were similar in group to the overall proportion distribution but slightly different between groups.[Conclusions] The conclusions of this study could be summarized as:(1)Model training stage.The model trained by the technology of dynamic mixed sampling and the thought of transfer learning could effectively improve the extraction effect of patient reviews when dealing with the problem of imbalanced samples.(2)Empirical analysis stage.Generally speaking,the doctor’s service attitude and the doctor’s ability the medical measures were focused on chiefly when patients feedbacked,while the medical cost,the medical environment and the medical effect were relatively rarely mentioned.From the aspects of different levels of hospitals,specialties,satisfy tendencies and feedback delays,patients’ concerns would have certain differences in their feedback expression.[Innovation and Limitation] The innovation of this study could be summarized as: the topic recognition model was trained by the framework of convolutional neural network.Then,in the process of training,a dynamic mixed sampling technology was proposed,combined with the idea of transfer learning,to improve the learning of imbalanced sample data.In addition,the topics of large-scale patient reviews data were extracted and analyzed considering of different aspects like hospital level,specialty,patients satisfy and feedback delay.However,there were still some problems in the study: First,the corpus of patient reviews was not perfect;Second,the model of pre-training words and classification model were relatively basic;Third,the strategy of multi-topic label data transformation was relatively simple.
Keywords/Search Tags:Online Patient Reviews, Topic Recognition, Dynamic Mixed Sampling, Transfer Learning, Empirical Analysis
PDF Full Text Request
Related items