Font Size: a A A

Research On The Opinion Mining And Hidden Sentiment Inclination For Web Text

Posted on:2012-03-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:H YangFull Text:PDF
GTID:1118330368478864Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The opinions mean someone's ideas and understanding about something, they are something's judgment and evaluation. The opinions are not the facts, because the opinions are not verified, unproved and confirmed. If later an opinion could be proved and confirmed, it is no longer an opinion, is becomes a fact. So from the views of a Web's visitor's it is more suitable to take all the information published on the web as opinions rather than facts. Knowing others'opinions has become the most important part of decision-making procedures. Now the Internet makes everything possible, we could get to know others and experts'opinions and attitudes even though we are not familiar with them. At the same time, more and more persons share their feelings and experiences on the internet. The abundant opinions resources on the internet such as personal blogs, online comments bring new opportunities and challenges. How to dig and understand others'opinions using information technology are opinions mining.Sentiment inclination analysis is to effectively analyze and mine the users'actively published contents, also called user generated contents on the web, to identify the contents'sentiment inclination, e.g. positive,negative,happy or sad, even to predict the trend of sentiment over time. By analyzing the sentiment inclination of the user generated contents, we could better understand the users'consuming habits, analyze the comments and responses of the current hot affairs and assist the enterprises and governments in making the reasonable and right decisions.But the current most-used information technology, especially the search engine technology is based on the keywords, could not search based on the sentiment and opinions. There are two reasons, firstly the sentiment and opinions could not be expressed and indexed by simple keywords, secondly the index strategy of the information search fields is not suitable for opinions.Now the problem of most sentiment analyzing algorithms is that we have to use simple terminology to express our sentiments about products and services. However, the culture factors, the subtle differences of the languages and the different contexts make it difficult to simply label a favorite or objective sentiment. So, firstly our paper deeply researched the sentiment inclination evaluation model and web text features extraction methods. We proposed continuous sentiment evaluation model and sentiment evaluation model based on the Chinese dependency grammar. On this basis, our paper combined hidden sentiment inclination evaluation model with the web text community mining algorithm and text clustering methods K-Means algorithms respectively in order to mine the web texts'topic community and sentiment trends, proposed web text community fast mining algorithm, web text community dynamic mining algorithm based on multi-agent and web text clustering algorithm based on hidden sentiment, our paper's mainly focuses are followings:(1) We proposed a features extraction method of subjective words using the Chinese dependency grammar based on web text space vector model. This method could extract the subjective words of the expressed texts following the Chinese dependency grammar rules while avoiding noises possibly. The experiment compared the performances of the IG,MI,CE and our algorithms under the KNN classifiers while using different feature vector spaces and unbalanced sample counts.(2) Aimed at the method of discrete sentiment inclination evaluation can not accurately describe the trend of sentiment, proposed two Chinese continuous sentiment inclination evaluation model:Chinese continuous sentiment evaluation model and sentiment evaluation model base the Chinese dependency grammar. The goal of Chinese continuous sentiment evaluation model is to propose a comprehensive and accurate sentiment inclination analysis method. This method identified the sentiment words of the sentences, judged every sentence's sentiment inclination through the context's sentence structure, and then combined all the sentences'sentiment inclination to predict the sentiment inclination of the whole documents. The experiment results showed that our method could accurately describe the web texts'sentiment trends in a specified period. The sentiment evaluation model based on Chinese dependency grammar is to judge prior polarity and modified polarity of the subjective words using the Chinese dependency grammar rules. Experiments showed that on the real Web data, the accuracy of our method'sentiment classification is higher than the traditional SVM and NB algorithm.(3) We researched web text community mining algorithm. For the different web community structures, those are static communities and dynamic communities our paper proposed web text community fast mining method based on hidden sentiment and web text community dynamic mining algorithm based on multi-agent respectively. Web text community dynamic mining algorithms could effectively mine the web text community of the same topics and the same sentiments while not knowing the web text community structures. The above two methods'common feature is that they all take count of the hidden sentiment factors in the web text community mining algorithms. The experiment results showed that these two algorithms could not only improve the accuracy of web text mining algorithm, but also improve the recall of the algorithm(4) We improved the classic text clustering algorithm K-Means, proposed a web text clustering algorithm based on hidden sentiments, this algorithm contained a similarity compared algorithm based on the hidden sentiment and text features, also proposed an original center selection algorithm base on a new classification mechanism. A good original center could represent the center of the text clustering and meanwhile distinguish this center from others centers better. The experiments validated that , using the online text sets of different types, compared the K-Means algorithm,Bisecting K-Means algorithm,UPGMA algorithm and the HSK-Means algorithm proposed in this paper, the text clustering algorithm with original center selection(e.g. Bisecting K-Means algorithm and HSK-Means) performed significantly better than the algorithm without original center selection.Above all, this paper deeply researched the web text topic mining and Chinese text hidden sentiment inclination analysis, mainly focused on how to evaluate the hidden sentiment inclination of the texts more accurately, that is continuous sentiment inclination evaluation, meanwhile, we proposed static and dynamic community of web text mining algorithms respectively. Finally, we given a web text clustering algorithm based on hidden sentiment and original center selection. Combining hidden sentiment analysis and community mining, not only can be more accurate, comprehensive understanding of the real views of opinions'holder, but help to use and learn from these opinions of people make the right decisions. This algorithm research and implementation methods are very novel and has a high theoretical value and practical value. So, this thesis is of great significance to the further research of opinion mining and sentiment analysis.
Keywords/Search Tags:Opinion mining, sentiment analysis, community mining, text clustering, feature extraction
PDF Full Text Request
Related items