Font Size: a A A

Opinion Mining In Online Forums For Financial Q & A System

Posted on:2011-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y T FanFull Text:PDF
GTID:2178330338989568Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, the development of online services supplies great archives for the manufacture of a collection of Q&A, forming a huge, rich content information database, for example: encyclopedic knowledge, personal BLOG, online forums etc. With online forum, for example: it usually focus on a particular area, such as traffic BBS, technology and financial BBS etc, it is a specific network fictitious space where people can answer questions, and discuss the problem, and willing to share their knowledge with others, with a series of operations, incentives, the interest of gain and extract knowledge is unprecedented. Because of the above characteristics, forums which contain a large number of series of practical discussion, can make the study extremely significance.In this situation, if users feel interested in some product or information, they need to browse or retrieve in many websites, not only waste time, but also couldn't get enough imformation they need. Based on these problems, our financial opinion automatic Q&A system will search the relavant opinions of stock, classify and figure out the probability of each category for users automaticly, thus the amount of imformation of answers is big and the interactive mode is friendly.In this paper, we build a financial opinion automatic Q&A system, which focus on solving the following three issues:a.Financial opinion mining from online forums:The organization format of one forum thread is one initial post and several following posts. In one thread, there maybe several objects being discussed and many opinions. Sometimes, because of the noise and the chaos of content, the percentage of opinions could be quite small. In order to identify these opinions effectively, we adopt two steps to filter and classify these sentences.In first step, we use rules to filter all the contents being downloaded to reduce the number of sentences into classification. This step will not only reduce the cost of classification but also increase the precision rate. In second step, we extract ten features, use information gain for feature selection, use support vector machine for performance testing. This classifier reached 83.11% accuracy in tatally new test set.b.Financial opinion polarity classification:Sentences which belong to financial opinions need to be classified into different polarities including positive, negative, neutral and compare. Considering the limitation of polarity labeled corpus, we use graph-based semi-supervised learning to get the polarity of opinion labeled corpus in order to expand the scale of corpus: we adopt similarity of sentences to build the graph model, including the rate of all kinds of features, the hownet similarity of sentences and so on; after that we use summarization for label propagating, to decrease a lot of time. Then we use the expanded corpus for learning, extract ten features including unigram, extended unigram, bigram, sematic feature, positive and negative words, positive and negative templates and the structure of sentence, use information gain for feature selection, then use support vector machine for performance testing. In the same situation, four polarity test data reached higher precision than supervised learning.Besides, we need to decide the object of opinion. If there is no emition in sentence, we can directly get objects by matching, but if there is emition, we need to find one or more objects from context by certain rules. We design and test the rules in labeled corpus which exist emitions, the precision reached about 86%.c.Information retrieval based on opinion lib:Sentences after the opinion classification, the polarity classication and the object identification, will be saved in opinion lib. As financial opinions are sensitive to time, our system should updata this opinion lib every day. When our system gets the input from users, it gets several records up to date from opinion lib, returns to users the most direct answer after analysis.
Keywords/Search Tags:financial opinion mining, financial opinion polarity classification, graph based semi-supervised learing, machine learning, data mining
PDF Full Text Request
Related items