Font Size: a A A

Genre Based Finance Web Page Auto-classification Approach

Posted on:2010-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:C Q SunFull Text:PDF
GTID:2178360332457873Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays, web page automatic categorization is full of good prospects and has enlisted people's interest and attention widely. It optimizes online information retrieval system. So people can enjoy a convenient and swift information service. However, as the prompt development of web information resources and people's higher exepectations to get more information more quickly, web page automatic categorization system faces vatious new challenges. Insides, web page categorization based on genre is one of the most distinguishing methods in information retriveval, which is different from traditional methods based on the frequency of topic words. So far people are not further along in their research than they expected. Thus, this paper gives a comprehensive description of the course of web page categorization based on genre. Besides, we implement some classical feature selection methods and propose a new one based on them. At last, we present three categorization algorithms and make a comparative research. This paper focuses on feature selection methods and categorization algorithms. The main contents are as follows:(1) This dissertation implements nine feature selection methods and makes a comparative study between them based on finance search engines. On this condition, we propose a new opposite words (OW) feature processing method. OW has its limitations which is only effective in cases of the data corpora containing opposite words. As redefinition of generalized opposite words, this limitations is further adjusted. In empirical results with classifiers on the real world corpora, we find that OW is a good teammate, when we make it cooperate with other eight feature selection methods, the effect come near to perfection.(2) On the platform of Hai Tian Yuan search engine, SVM, Bayes and Naive Bayes classi?er are carried out and made a comparative study. As the results indicate, each of them has its advantage with different feature selection methods. We find out good partner and optimize the process of categorization which makes classifier perform best in web page automatic categorization.Finally, we realize the successful integration of the feature selection methods and categorization algorithms. Futher more, performance comparision and analysis are made through experimental data and results, consequently optimize finance web page auto-classification approach based on genre.
Keywords/Search Tags:web page automatic categorization, genre, feaure selection, categorization algorithm
PDF Full Text Request
Related items