Font Size: a A A

User Opinion And Behavior Mining In Social Media

Posted on:2015-04-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:W W HanFull Text:PDF
GTID:1228330467463659Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of Web2.0technology, the Internet online service has gradually become an integral part of people’s daily life. Every day, users get access to information on new platforms, publish opinions and interact with others, etc. All the corresponding texts, images, audios, videos and user logs make up the ocean of Internet UGCs (User Generated Content). Some key web applications include:real-time micro blogging service, long-text sharing blog service, user discussion groups, social network services, user review sites and knowledge sharing platform, etc. On the one hand, the new technology and application have brought convenience to people, which stimulate the growth of new user requirements continuously. On the other hand, in the growing process of new technologies, a great many challenges and difficulties are needed to be solved. This paper analyzes user opinions and behaviors in social media from four aspects.(1) The paper proposes feature construction approach and opinion modeling approach based on the concept of rhetorical question distance. Since Internet text are massive, informal, short in length, we propose RDT threshold to filter patterns and construct feature library, besides we design GF feature to estimate the similarity between texts and feature library. The results show that the feature vector dimension-reduced method could reach relatively high modeling speed and high precision in identifying rhetorical question opinions. The impact of smoothing factor and threshold parameters on the overall precision is studied, resulting in the recommended strategy that smoothing factor should be small and threshold should be set to rhetorical distance threshold.(2) The paper proposes a subjective expression library extension method based on structural context. The dictionary-based extension method perform not well in novel expression discovery, extension of large amount of expressions, and independence from segmentation-tool, etc; meanwhile, the corpus-based methods using conjunction relation and co-occurrence relation have low coverage of expressions. Hence, we propose structured context method to predict the occurrence of seed opinion words using information content and PMI, as well as estimate the similarity between the language environment of seed words and candidate expressions. The results show that the method can effectively build accurate and novel opinion expression library. In general, the accuracy of the proposed method is little more than baseline libraries, and novelty largely outperform others. Besides, the low coverage demonstrates the important supplementary role of proposed method to other existing libraries.(3) The paper proposes a method to calculate the user knowledge contribution ability in the platform of knowledge sharing community. Social characteristics are supplemented to traditional knowledge sharing platform, which make the storage of knowledge move from service database to every potential knowledge provider. Hence, the core task accordingly changes from query-content similarity computation to user knowledge providing ability estimation. In this paper, through analyzing the ability from the aspects of content quality, activity and influence, the user’s standalone abilities are modeled and estimated. Then, different criterions are integrated using social link analyzing technology. Also, the recommended user list is generated base on the integrated value. By using the proposed method of estimating the user knowledge contribution ability in knowledge sharing platform, the results show that the method can effectively integrate the user abilities from various aspects, preventing the bias to any single factor. For the top1000users, the ranking result is more close to influence ranking, then to activity ranking, then to replier ranking. The impact of damp factor on the ranking result and its impact on the iteration number are studied. Also, the weight assignment strategy among different abilities is also investigated.(4) The paper proposes a method to estimate the user ability of serving as Internet information source in the micro-blogging services. Massive accounts in micro-blog service bring trouble to new-registered users, because it makes it difficult to filter out media source for subscription. Hence, in this paper, the media capacity model is constructed to measure the user ability to provide news content. The proposed method considers the following standalone abilities:user activity, reliability of content, content amount and stability of content publishing. Then, the collected decision making method weighed Borda ranking are used to map the sorting results under different criterions into a single space. In addition, weights are assigned to abilities using supervised method by measuring its contribution to the overall media capacity value. The results show that media model can effectively represent the user ability to provide news information content. Also, the weight assignment strategy among different abilities is also studied using supervised learning method, yielding the criterion importance order of user credibility, content stability, content amount and user activity.
Keywords/Search Tags:opinion analysis, user behavior analysis, web data mining, machinelearning
PDF Full Text Request
Related items