Font Size: a A A

News Relevance Classification In Financial Information Retrieval System

Posted on:2011-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y WeiFull Text:PDF
GTID:2178330338481050Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of financial industry and stock market, financial information expands in a high speed in the Internet as well as the requiremnt for it. People read financial news to understand what is happening significantly to a company or stock. Generally, financial news stories are presented on online finance communities in portal websites and search websites based on manual editing and general search engine, which suffer from problems of narrow coverage areas and low accuracy respectively. Thus, both the two approach could hardly meet the customers'expectation. In order to solve the problem general search engine face in financial area, financial search engine came out and the relative technology about news relevance classification had been paid more and more attention. This paper focused on news relevance classification method in financial search engine. Two approaches aiming to solve two different problems, financial field based relevance classification and stock based relevance classification, were proposed.(1)Financial field based relevance classification. Our research treated financial field based relevance classification problem, also called financial news importance classification, as one-class classification problem and tried to use one-class classification method to solve it. One-class classification based financial news importance classification method established model for important financial news only. It worked in following steps, in the training process, a threshold was computed based on a predetermined recall, while in the prediction process, each news story was evaluated by calculating the similarity between its vector and the model established in the training process, then it would be labeled as an important news story if the similarity was greater than the threshold. We investigated on the influence of features number and threshold to the performance of three one-class classification method Roccihio, k-means and one-class SVM. As experimental results showed, the method k-mean algorithm which had the best performance produced the precision up to 80% while maintaining recall at 95%.(2)Stock based financial news relevance classification. Our research reduced the ranking problem for stock based financial news relevance to classification problem with two categories, relevant and irrelevant. According to the text structure of financial news, features were extracted from five different parts, title, content, related paragraph, related sentence and URL, including not only general features like proportion of keywords but also several novel financial features including industry relevance, financial field relevance, the proportion of digital information, the proportion of related paragraphs and so on. For traditional information retrieval model was lack of the ability of combining large number of features, learning to rank approach was introduced to handle features abovementioned. As experimental results showed, the performance of our approach outperformed language model based classification method and other two basic retrieval models on test corpus.Our research was applied in Hai Tian Yuan financial news retrieval system.
Keywords/Search Tags:vertical search, text relevance, one-class classification, learning to rank
PDF Full Text Request
Related items