Font Size: a A A

Research Of Rough-Based Text Classification Of Web Pages And Information Extraction

Posted on:2008-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:K DengFull Text:PDF
GTID:2178360215988155Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The web is a huge repository of information and there is a need for categorizingweb pages to facilitate the indexing, search and retrieval of pages.Rough set theory introduced in early 1980's is a formal mathematical tool totreat vague and uncertain knowledge. In rough set theory based practical applications,any preliminary of additional Information about data is needed, and readabledecision rules are easily inducted with lower computational complexity. It hasalready been applied to a very wide variety of domains.In this paper, we discuss several issues related to automated text classificationof web pages. We discuss the process of text classification of web pages and analyzefeatures selection and categorization algorithms of web pages and give somesuggestions for web pages categorization. We investigate the effectiveness of therough set selection on web text classification and propose a new feature reductionmethod based on the rough set theory. With the new feature reduction method, wecan also get the key words of someone category and their significance.
Keywords/Search Tags:Web text classification, rough set, feature reduction, information extraction
PDF Full Text Request
Related items