Font Size: a A A

Hot Topic Detection And Hotness Evaluation From Financial Bbs

Posted on:2011-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2198330338989593Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The emergence and development of the internet make the age of information resources go through the poor period into the highly enriched period. With the rapid development of internet, BBS has become the important place for people communicating the information and views. Also with the rapid development of the world economy, people began to be keen on stocks, bonds, funds, and other investment in financial field. How to get the newest and hottest financial topics which people concern most in the current is wanted to be known by the most internet users.Through designing and implementing the system of topic detection for financial BBS, we can solve these problems. In this paper, we analyze the character of the text organization in financial forum. Also, we introduce the topic detection technique in details, especially the text pre-processing methods and clustering algorithms. Bases on these researches, our main tasks are in the following aspects:(1) Extraction data form financial forum and deal the data for pre-process: through researching the methods of extracting the data which be used commonly and analyze the structure of network page, we propose the method of building the DOM tree for forum posts and find zone of information from the posters and we get the method of extracting the data from forum based on the most repeat DOM tree.(2) We propose an algorithm of feature extractions for the text in financial forum. Aiming the problem that the forum text is always short and the weight of financial word own the more importance, we propose the algorithm of feature extraction for financial forum based on the rule of BTF*IDF. The results of the experiment prove that contrast with the traditional method of feature extraction, this algorithm can get the better performance for clustering of the short forum text.(3) We propose an algorithm of clustering base on the time development. Compared to the traditional clustering algorithms, we introduce the topic lift-support model into the clustering process which gifts the live to the topics. Through sufficient experiments, we prove that the algorithm effectively optimize the result of topic detection for financial forum.(4) We adopt an algorithm for calculating hotness base on the topic focus and the attention of the users which can accurately evaluate the hotness of a topic. The experiments show that the algorithms can present a scientific sort of hot topics.Based on the above results, we design a hot topic detection and hotness evaluation system based on the financial forum which can effectively provide the majority of the internet users the newest and the hottest topic from financial forum. So internet users can easily grasp the financial hot topics which most people concern in the vast internet network.
Keywords/Search Tags:financial BBS, information extraction, feature selection, clustering, hotness evaluation
PDF Full Text Request
Related items