Font Size: a A A

Research On Some Key Problems Of WEB Subject View Mining For Securities Application

Posted on:2014-04-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:L XueFull Text:PDF
GTID:1108330434971196Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
According to the hypothesis of EMH(Efficient Market Hypothesis),it is a promising way to predict the stock market movement by mining the useful information from Inter-net.With the development of the Inter net, especially the rapid spread of social networks, many subjective textual data which carries much sentimental information appears on the Internet. Many hot topics about the society and social daily life are discussed in these web article, so it is useful to extract the subjective information from these articles for predict-ing the stock markets. However,it is impossible to mining these sentimental information by traditional text mining techniques,because of the complicated representation of senti-mental information, which is quite different from that of objective information. Hence, a new branch of Text Mining research,called Opinion Mining or Sentimental Analysis has emerged which aims at mining sentimental information from web textual data.After the emergence of Opinion Mining research, it grows rapidly into a new impor-tant research branch of Web Mining and Text Mining, and has attracted more and more research attention. In addition, some of Opinion Mining techniques have been applied into the E-commerce area, which have played an important role in many business appli-cations. Inspired by the successful application of opinion mining in the business area, some researchers tried to apply this technique into the research on stock market. How-ever, as a novel research branch, there are many important issue needs to be solved. For example, although topic information and opinion information both are valuable content of web articles, it is difficult for current Opinion Mining methods to extract these two types of information together from articles. Yet,the separated mining methods cannot capture the relations between topics and opinions, which would constraint the usage of mining results among the stock market.To solve these problems, we propose a novel opinion mining problem named topic opinion mining in this paper, which aim at mining the opinion information on latent topics from web articles. Our study covers many important issues of Opinion Mining research, such as topic-opinion quantization,topic-opinion integration, and topic-opinion classification,etc. Furthermore, our research are tightly related with some specific securi-ties applications, for example, in this study, we propose a novel stock market movement prediction method by classifying the aggregative web topic opinions. In general, the main content of this study is given below:1. In order to extract the topics information and the corresponding opinion information together from the web articles, this study proposes a novel opinion mining model called Document-Topic-Opinion Model(DTO Model). The DTO model introduces "Opinion" into basic three-layer LDA model including Document layer,Topic lay and Word layer, which becomes a four-layer Bayesian generative model. From the perspective of machine learning, the DTO model belongs to the category of unsuper-vised methods. Because of the couplings between latent variables, it is impossible to inference the unknown parameters by deduction methods. To deal with this problem, we propose an approximated estimation methods based on Markov Chain Monte Carlo simulation. The experimental result shows that the DTO methods can achieve high performances on extracting topics and opinions from web articles.2. To deal with problem of opinion quantization which is a tuff problem for the existing methods, we make a new topic-opinion hypothesis based on DTO model, which assumes the opinion expressed by an article satisfies a multinomial distribution over latent topic space. According to this assumption, we further propose a quantization representation model, called Document-Topic-Opinion Vector, which correlates the article opinion with the topic-opinion by DTO model.3. The opinion information expressed in one article is very sparse compared with that of a corpus.Thus, it is impossible to capture the sentiment trend of the Internet by directly extracting opinions from separated web articles. As to this problem, we propose a topic-opinion integration model,called Topic-Opinion Vector Aggregation Model(TOVA), which is based on article wight and topic weight. The TOVA model can be used to extract the opinions on hot topics from the Internet, which can satisfy the information needs of many applications. To evaluate the performance of this model, we apply it into the application of stock market movement prediction.The experimental results show that DTO model has good ability of mining opinion on hot topics from latest web financial articles.4. By analyzing the problem of text classification, we find some traditional classi-fiers,such as SVM, are not appropriate to solve multi-labeled classification problems. As to this problem, we propose a novel classification method based on Normaliza-tion theory and Fuzzy Set theory, which is called Multiple Data Domain Description Model(MDDD). In essence, the MDDD model follows the basic idea of multi-task learning methods, which claims the advantages of capturing the correlations among different tasks when the model is trained by the training sets. The experimental results have proved the high performance of our proposed MDDD model.
Keywords/Search Tags:Opinion Mining, Text Mining, Topic Model, Stock Movement Prediction, Multi-task Learning
PDF Full Text Request
Related items