Font Size: a A A

Research On Sentiment Analysis Based On Semi-supervised Learning Method

Posted on:2013-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:X L LiuFull Text:PDF
GTID:2248330395483753Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development and application of Internet in the past few years, the Internet has become the most important sources for people to obtain information.The Internet has changed the life habit of people and created new business and economic behavior.The model that users create and share the information on the Internet has become more frequent and popular.More and more users begin to browse through a large number of network comments to learn about other users’evaluation of goods and services.It can help users to make reliable buying decisions.However, as the rapid growth of the network comments, when face the huge information and data, users have to spend much time and energy to identify the effective and useful information.So, it becomes quite difficult for users to retrieve useful information quickly and accurately in mass information.Besides, manufacturers and producers also hope to know the products and the user through evaluation of the services, so as to improve the competitiveness of their products and services. In this context, sentiment analysis is developed as an effective unstructured information mining technology in a desperate need, aiming to evaluate the sentiment tendencies.Firstly this thesis reviews the current research situation of sentiment analysis technology at home and abroad, and introduces some key steps and main algorithms used in sentiment analysis. Sentiment analysis is considered as a special text classification problem to judge the sentiment orientation. Among the good research results, researchers use machine learning algorithms such as support vector machine, the maximum entropy, random field conditions, but all these approaches obtained high classification accuracy at the great cost of large high-quality train-set labeled by hand. It consumes too much energy and time.In order to effectively use a lot of accessible and free data, as well as using the implied information of these unlabeled data to improve the performance of classifier.The thesis adopted transductive support vector machine(TSVM) algorithm that belongs to semi-supervised learning algorithms.Given some shortcomings of semi-supervised learning method, and the algorithm may make wrong estimation about the distribution of unlabeled data, leading to reduce the correctness of classification.So the thesis introduces active learning and present a new TSVM based on active learning approach. In the process of learning, using active learning strategy to obtain the most uncertain unlabeled data and label the data. It can reduce the iteration times of classifier and improve the classification performance.Finally, this thesis design a sentence-level Chinese customers review system, with SVM, and improved TSVM approaches for training.Through the test, it shows that the algorithm presented by this thesis is superior to other algorithms in performance.
Keywords/Search Tags:Sentiment analysis, Semi-supervised learning method, Transductive supportvector machine, Active learning
PDF Full Text Request
Related items