Font Size: a A A

Research Of Investor Sentiment Classification For Guba Comments

Posted on:2018-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:L J SongFull Text:PDF
GTID:2348330563952626Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Behavioral finance believes that the trend of the stock market will be affected by the emotion,psychology and other subjective factors of irrational investors.Compared with the foreign stock markets,China's stock markets are still not mature and perfect.Most investors use short-term trading behavior and collect information through various channels to assist in making investment decisions.In the modern society,in addition to the traditional media,the network has become an indispensable medium of communication and gradually occupy the mainstream.Many areas,including the financial sector,are affected by the network to a certain extent.Guba has a massive stock data and rich investors review information and provides a platform to communicate with investors.Thus obtaining accurate stock sentiment has positive significance for further research on the trend of the stock market.This paper captures the stock comment information in Guba by distributed network spider technology.Because the stock comments have the characteristics of randomness,many new words cannot be correctly identified by Chinese segmentation system.This paper combines the structure and feature of word formation and uses a combination of rule and statistic technology to identify new word.Then,this paper uses graph structure model to represent stock text,and finally use two text sentiment classification algorithms based on graph kernel to classify stock comments.The results show the classification effect of graph kernel is better than other common kernels.The main research contents of this paper is listed as follows:(1)An improved Apriori algorithm for new word recognition is proposed.Firstly,this paper use a suffix tree and related operations to implement Apriori algorithm,which supports order and repeated text and simplifies the process of eliminating low-frequency words and calculating term frequency.Secondly,this paper adds combined frequency factor to calculate mutual information,which can solve the problem of low-frequency words having high mutual information value.Then,this paper use the left and right context entropy to calculate the flexibility of repeated strings.Finally,this paper gives a new formula for new word score,which simplifies the process of new word recognition.(2)This paper proposes a method for text representation of directed graph model.The five tuple of the graph is used to express the words and the adjacency relation between words in the text.In the aspect of text representation,although the vector space model has the advantage of simple implementation,it is lack of structural information.Graph model can make up for the lack of structural information.(3)A novel algorithm for text sentiment classification based on improved random walk kernel is proposed.The algorithm is achieved by using the method of suffix tree matching,so that the time complexity is reduced from O(n~3)to O(n~2).The experimental results show that adding text structure information can improve the effectiveness of text classification.(4)A novel algorithm for text sentiment classification based on interval walk kernel is proposed.In this algorithm,the structures of text are calculated from the two aspects of the continuous walks and interval walks,which improves the integrity of the matching walks.The experimental results show that the classification performance of interval walk kernel is better than that of random access kernel.
Keywords/Search Tags:investor sentiment, new word identification, text representation, random walk kernel, interval walk kernel
PDF Full Text Request
Related items