Font Size: a A A

Research Ofweb Document Classfication Based On Fuzzy-rough Set

Posted on:2011-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:H H SunFull Text:PDF
GTID:2178330332470834Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With rapid development of computer and network technology, network has become the main media to storage and access information. Network provides users with massive information, but also it causes complications for users to gain helpful message from it. Automatic text classification technology can locate useful information quickly and help users capturing knowledge efficiently. Fuzzy-rough sets theory is a new method to deal with uncertainty information. It can solve the problem of information loss in the process of Rough-discretization and raise efficiency of attribute reduction, then make classification more efficient.Based on theoretical study and research literature, this paper analysis the inadequacy of existing algorithms and puts forward a method to Web document classification based on Fuzzy-rough set. Firstly, preprocessing Web documents that collect from internet, representing the preprocessed Web documents by vector space model, forming initial attribution features space and conducting weight value computing. Secondly, introducing the method of Rough set to reduce characteristic attributes space. For each category, generating a most simple attributes set, then forming classification rules. And those most simple attributes sets contain degree of membership are fuzzy sets. Thirdly, proposing a text classification algorithm based on fuzzy rough set, matching key attributes directly and computing neartude between the document and categories by classification rules, then classifying document by maximum neartude principle. Finally, testing the algorithm through experiments to determine two parameters (attribute space dimension and the number of documents classified), then optimizing the algorithm.This algorithm has been tested and compared with traditional algorithms. The experiment results show that the Web document classification algorithm based on fuzzy rough set has good classification performance comparing with KNN and SVM...
Keywords/Search Tags:machine Learning, fuzzy-rough set, web document classification, attribution reduction
PDF Full Text Request
Related items