Font Size: a A A

Research On Text Mining In Online Health Community

Posted on:2014-01-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y J LvFull Text:PDF
GTID:1267330422954227Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
In recent years, people pay more and more attention to their health. Their perceptions onhealthcare have gradually changed from passive disease treatment to positive healthself-management. However, it is difficult for people to actively participate in treatment decisionsas well as day-to-day health self-management without good information exchange platform.Online health community has grown rapidly recently where patients and their caregiverscommunicate their interesting information, share their experiences, and offer emotional supportand encouragement. A thorough understanding of online health community is a very significantissue. Our study could assist the websites in optimizing the human-computer interface, providingpersonalized tools and functions to facilitate patient engagement and improving the ease of useand social interaction. More importantly, our study is of great help to the end users of onlinehealth communities themselves, which could enable them to obtain a sense of what online healthcommunities are, quickly find the issues they concerned about, and become involved in onlinehealth communities more easily.Online health community has become a hot research issues as it plays an increasinglyimportant role in people’s daily life. Many studies have been done from different perspectives,including exploring health-related hot topics, analyzing the characteristic of community membersand their emotional expression. But most of the research adopted the methods based onquestionnaire or content analysis. When faced with the growing number of community membersand their posts, these traditional manual methods has become impossible to process huge amountof data. Therefore, we planed to use automatic methods such as machine learning and text miningto study the hot issues in online health community, including: health-realted hot topicidentification, community members’ role identification and sentiment analysis of the communitymembers.(1) Health-related hot topic identification. Health community members discuss theirinteresting health-related topics in online health community. Howerver, unordered text structuremakes it difficult for the users to retrieve valuable information and is also hard for web designersand researchers to find community members’ needs. Thus, we proposed an automatic identificationframework for health-related hot topics. With the help of UMLS medical knowledge source, weextracted n-gram features, domain-specific features and sentiment features which could effectivelyrepresent health-related topics. And then using text clustering technology, we divided all the textdata into different clusters and each cluster represents a health-related hot topic. And finally alltopics could identified based on the extracted keywords. Then we made an experiment to evaluateour mothod. We chose the well-known online health community Medhelp as data resource andcollected sample data from three disease discussion boards, they were lung cancer, breast cancerand diabetes. After determining the values of the model parameters, we got7clusters from threekinds of diseases forum that represented7health-related hot topics, including personalself-introduction, emotional expression, symptom, examination, complication, medication andtreatment. Further analysis of the results showed that the distributions of health-related hot topicsin different types of diseases are different significantly, such as the discussions of symptom in lung cancer forum, examination in breast cancer forum and medication in diabetes forum aresignificantly more than that of other topics.(2) Participants’ role identification. There are different types of participants involved inonline health community and they have different demands and behavior characteristics. Theidentification of different types of community members helps the websites provide personalizedservices to meet the needs of different users, and meanwhile facilitate community members toenhance mutual understanding and trust. However the lack of personal information caused byprivacy protection makes it difficult to identify, community members’ role. So we introduced thetheory based on the stylistics text of role identification constructing the participant roleidentification method of online health communities. Through the community members’ post textwriting characteristics to determine the role of different types, extract stylistic features includinglexical features, syntactic features and structural characteristics and combined with content-relatedfeatures to generate feature sets. Then we will use the text clustering algorithm to classify all postsaccording to the different style of writing characteristic and ultimately realize the role ofcommunity members’ effective identification. Finally we chose the same sample data as used inthe experiment of hot identification and made an experiment to identify three main roles in onlinehealth communities: patients, caregivers and medical experts and futher discuss the difference ofthe three main groups of members.(3) Sentiment analysis of community members. The community members expressed theiremotions through posting in online health community. We proposed to use sentiment analysis toidentify the subjective posts including emotional expressions of community members, and analyzetheir polarity. Firstly, we proposed the method of sentiment analysis based on machine learning Bychoosing feature set such as domain-specific features, POS features and stylistic features, weclassified all the forum posts into objective posts and subjective posts and further classfied thesubject posts into positive posts and negative posts. Meanwhile, we proposed another method ofsentiment analsis based on sentiment dictionary to identify the sentiment expressions of forumposts through extracting the sentiment words from forum posts and summing the sentiment values.Through the experimental test, we found that two kinds of methods have their advantages anddisadvantages, so we finally proposed a comprehensive model of sentiment analysis by combiningthe two methods. In the last discussion, from multiple perspectives such as different disease types,different health topics and different types of member roles, we analyzes and summarizes theemotional expression characteristics of community members.The contributions of this paper are listed as follows:(1) We proposed the method of health-related hot topic identification using text clustering.The current research on health-related hot topics were based on manual statistics, resulting in lowefficiency and lack of science. So in this paper, text clustering method was introduced intohealth-related topic identification. Based on the traditional text representation, we proposed to adddomain-specific features and sentiment features into the text representation to improve the resultsof topic identification. Both features were proved effective in distinguishing differenthealth-related topics through the following experiment. (2) We proposed the method of participant role identification based on stylistics. A betterunderstanding of different roles of community members was very significant to study onlinehealth community. However, the lack of personal profiles and privacy protection made it difficultto identify members’ roles, thus few studies has been done in this fields. In this paper weinnovatively proposed the method of role identification based on stylistics, extracting lexicalfeatures, syntactic features and structural features that effectively distinguished writing style ofdifferent types of participants to identify the members’ roles.(3) We proposed a comprehensive sentiment analysis model applied to online healthcommunity by combining the two sentiment analysis methods based on machine learning andsentiment dictionary. In the classification of subjective posts and objective posts, we used themethod based on machine learning. By chosing domain-specific features, POS feature, stylisticfeatures to construct feature set and distinguished subjective posts from objective posts. Insubsequent analysis of sentiment polarity of subject posts, we used the method based on sentimentdictionary and extracted sentiment words to judge the polarity of subject posts. The methods wereproved effective through the following experiments.
Keywords/Search Tags:online health community, health-related topic identification, role identificaiton, sentiment analysis
PDF Full Text Request
Related items