[Purpose]To identify the theme of health Q&A community,mine the real information needs of patients,find the hot topics concerned by patients,conduct intelligent topic classification of questions,and to promote accurate information service and information push;to reveal the patient’s emotional expression characteristics and help patients actively face the disease by the emotional analysis of the health Q&A community.[Methods](1)Write the scrapy crawler code to collect a total of 35,000 question data in the diabetes channel of diagnostic medicine network and store the data in the Mysql database.(2)Use the jieba Chinese word segmentation tool,the stop word list,and the user dictionary consisting of the international disease classification vocabulary and the diabetes drug list commonly used in the Chinese Type 2 Diabetes Prevention Guide to segment and preprocess the initial text to form the initial corpus.(3)Extract 1/5 of the data from the corpus for pre-experiment,construct the preexperimental topic probability model,extract 33 topic tags from the generated themes,and finally manually submerge into 10 topic categories.Then the whole corpus is trained to generate 94 themes.These themes are merged into 10 topic categories by merging algorithm combined with manual annotation,and topic analysis is carried out according to topic classification results.(4)Randomly extract 8000 data from the corpus to manually label the emotional polarity,use the word2 vec model to extract features,and import the transformed data sets into K-nearest neighbor algorithm,naive Bayes algorithm and support vector machine for training.Rate,recall,and f1 values are used to evaluate the generated sentiment classifiers.Finally analyze the characteristics of user emotion based on emotion classification results.(5)Build a logistic regression model based on gender,age,theme and emotional polarity to explore the influencing factors of emotional polarity.[Results](1)Subject classification results showed that disease prevention and control subjects accounted for 18.7%,diet accounted for 14.5%,medical guidance accounted for 13.5%,comorbidities accounted for 12.5%,complications accounted for 12.4%,disease treatment and post-accumulation accounted for 8.3%,disease progression The harm accounted for 7.5%,medication guidance accounted for 6.8%,etiology and diagnosis accounted for 3.9%,and heredity accounted for 2.0%.(2)In the sentiment analysis,it is found that the classification effect of the support vector machine is better.After the support vector machine classification,the positive texts in the user’s questions accounted for 26%,the neutral texts accounted for 34%,and the negative texts accounted for 40%.(3)Gender,age,and etiology and diagnosis,diet,complications,comorbidities,disease prevention and control,disease treatment and recovery,medication guidance,medical treatment,and genetics topics have statistically significant effects on emotional polarity(p<0.05).The effects of disease progression and harm is not statistically significant.[Discussions](1)Patients are more willing to consult the scientific content of disease prevention knowledge and dietary guidance,and the health Q&A community information services provide limited therapeutic advice.(2)The emotional needs of the patient for network medical consultation are very high,and the patient is eager to receive emotional support.(3)There is a correlation between gender,age,theme and emotional polarity,which requires further research. |