Font Size: a A A

The Study On Financial Information Retrieval Oriented Genre Classification And Sentiment Analysis

Posted on:2012-04-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:J XuFull Text:PDF
GTID:1118330338489742Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The sheer abundance of information on the ever-expanding WWW is making it ex-tremely difficult to quickly find the desired information even with the improved searchengines. Towards to this problem, many categorization and clustering approaches havebeen proposed to classify the searching results. Most of them are topic-centered. Differentfrom these traditional approaches, this study explores the genre, sentiment, and relevanceanalysis techniques to improve the effectiveness and user experiences of finance verticalinformation retrieval.Text genre is one of the most distinguishing features in information retrieval whichis used to sort the search results. This study examines the effectiveness of machine learn-ing based techniques to finance text genre classification based on the selected surface andextracted structural features. Based on the likelihood ratio test, we propose two new meth-ods for selecting classification feature terms with high genre class discrimination power,which improves the performance . Three types of structural features , namely context fea-tures, frequent strings and text patterns are extracted by combining the domain knowledgeand empirical data. The effectiveness of these methods is evaluated with machine learn-ing methods on the real world text. This study has been used to improve the performanceof a finance search engine and to help the user to locate the genre relevant information.Sentiment polarity in financial news can also be used to organize and locate the rel-evant results in finance information retrieval. Therefore, the sentiment analysis of financetext is the key in solving the problem of a large number of irrelevant results. In this study,we examine the effectiveness of language modeling approaches to sentiment classifica-tion of Chinese financial news articles. The experimental results show that the proposedapproaches are effective and robust. These approaches perform better than the approacheswhich use traditional machine learning techniques. We also present two corpora strate-gies for automatically building reliable sentiment corpus from stock reviews and news,respectively.The lack of reliable sentiment annotated resources is one of the core barriers in sen-timent analysis. To leverage the corpora in other source languages for supervised trainingof sentiment classifier in the target language, i.e., cross-lingual sentiment analysis is inves- tigated. To address the problem of low-quality transferred examples caused by inaccuratetranslations and different feature/category distributions between training and testing datafrom different languages, we propose to apply instance-level transfer learning techniqueto cross-lingual sentiment analysis. Three transfer approaches are proposed to reducethe affection of low-quality translated examples and to select the high-quality translat-ed examples. Starting from the union of small training data on target language and largetranslated examples, Transfer AdaBoost (TrAdaBoost) is firstly proposed to iteratively re-duces the affection of low-quality translated examples. Considering that the re-weightingscheme adopted in TrAdaBoost has the potential risk of overdiscarding source training ex-amples, this algorithm is further improved by combining the bagging procedure and theboosting procedure of TrAdaBoost. The revised algorithm is named as Transfer Boost-ing with Bagging (TrBB). Alternatively, starting only from the training data on targetlanguage, Transfer self-training is proposed to iteratively select high-quality translatedexamples to enlarge the training data set. These algorithms are evaluated on document-level and sentence-level Chinese sentiment analysis tasks in bilingual case, respectively.The achieved encouraging performances show that our proposed transfer learning basedapproaches effectively improve the sentiment analysis by exploiting small training data intarget language and large cross-lingual training data.Finally, industry-sector, stock, and other financial products in terms of object-levelinformation retrieval is desired by the financial vertical search. Obviously, traditionalinformation retrieval model cannot be directly used to measure the relevance between ob-jects and web documents. In this study, we analyze financial user's search intents and es-timate the relevance of the query object and the document from topic, domain, sentimenttrend, and industry aspects. Four types of features are abstracted and quantified from thequery object and the document. A discriminative classifier, namely Support Vector Ma-chine (SVM), is trained as a relevance model. Our experiment results on ad-hoc financeinformation retrieval indicate that learning to rank approach is not better than languagemodel approach on the model itself. The advantage of SVM over language models isthe better ability to learn domain-specified features. In financial industry information re-trieval and recommendation, a one-class classification model is presented to estimate therelevance between document and industry. Based on selected industry-specific descrip-tion terms, three different one class classifiers i.e. k-means, one-class SVM, and language model algorithm, are trained with only relevant (positive) documents, respectively. Theexperimental results show that the proposed methods perform well on real data.This study shows that in addition to the traditional topic-analysis of web documents,effective genre classification and sentiment analysis are helpful to improve user's searchexperience and organization of search results. This study is the first attempt in Chi-nese financial information retrieval. The outputs of this study have been applied to theHaiTianYuan knowledge service platform.
Keywords/Search Tags:Financial Information Retrieval, Genre Classification, Sentiment Analysis, Cross Lingual, Learning to Rank
PDF Full Text Request
Related items