Font Size: a A A

Reseach On Credibility Estimation Of Stock Comments With Sentiment Analysis

Posted on:2015-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y QiuFull Text:PDF
GTID:2308330479989744Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the relation between the Internet and financial services grows, the requirement of getting financial information quickly and precisely becomes more and more urgent. But the incredible information has make great challenge to the investment decisions. How to find out credible information from large scale of data turns to be a key issue. There has already been lots of researchers engaged to identify credible information, while it is short in relative work and great developments in stock comments. Therefore this paper has analyzed the stock comment in financial information service, and come up with the method of calculating the credibility of stock comment via sentiment analysis.The main work contains four parts: firstly, characterisitic based on stock market and stock comments,analysis sentiment of stock comment. Emphatically describes three feature selection methods: Feature selection method based on Uni-Gram/Bi-Gram, on domain dictionary and on the structure of the article. It is found that the system has optimal performance when using the three feature simultaneously. Secondly, due to the lack of short mechanism, the amout of positive stock comment sample is much larger than that of negative stock comment sample. Imbalanced class distribution significantly affect the performance of the supervised learning classifier. Therefore we analysis sentiment on imbalance stock comments data. The method based on oversampling and ensemble learning are proposed and implemented. Experimental results show that the method based on oversampling contributes little to improve the performance,but the ensemble learning method shows sufficient advantage on improving the performace of limited class sample classification. Thirdly, calculating the credibility of publishers. This work can be divided into two parts: The publisher’s historical credibility and publisher’s industrial credibility. And calculating the credibility based on natural labels and historical stock prices. Fourthly, calculate the credibility of stock comments based on sentiment analysis and the credibility of publishers.This paper made three contributes as follows: Firstly, establish a dictionary about stock field based on short stock comments. And analyze the structure of comment texts, according to the analysis of statistical data, selecting method based on the article structure feature is provided. Secondly, analysis the data of stock comments and apply imbalanced data processing method to process the stock comment classification. It shows that the ensemble learning method shows sufficient advantage on improving the performace of small class sample classification. Thirdly, analyze the calculation method of the publisher’s historial credibility and the publisher’s industrial credibility and providing a reliable authentication method.
Keywords/Search Tags:sentiment analysis, stock comment, credibility calculation, imbalanced data, ensemble learning
PDF Full Text Request
Related items