Research Of Rough-Based Text Classification Of Web Pages And Information Extraction

Posted on:2008-07-14

Degree:Master

Type:Thesis

Country:China

Candidate:K Deng

Full Text:PDF

GTID:2178360215988155

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The web is a huge repository of information and there is a need for categorizingweb pages to facilitate the indexing, search and retrieval of pages.Rough set theory introduced in early 1980's is a formal mathematical tool totreat vague and uncertain knowledge. In rough set theory based practical applications,any preliminary of additional Information about data is needed, and readabledecision rules are easily inducted with lower computational complexity. It hasalready been applied to a very wide variety of domains.In this paper, we discuss several issues related to automated text classificationof web pages. We discuss the process of text classification of web pages and analyzefeatures selection and categorization algorithms of web pages and give somesuggestions for web pages categorization. We investigate the effectiveness of therough set selection on web text classification and propose a new feature reductionmethod based on the rough set theory. With the new feature reduction method, wecan also get the key words of someone category and their significance.

Keywords/Search Tags:

Web text classification, rough set, feature reduction, information extraction

PDF Full Text Request

Related items

1	Research And Application Of Text Feature Reduction And Classification Rule Extraction
2	Based On Rough Set Text Automatic Classification Study
3	Research On Text Emotion Classification Based On Rough Set
4	The Research On Text Classification Technology Based On The Rough Set Theory
5	Research On Optimization Of Text Classification Based On Improved Rough Set Model
6	Research On Text Classification Based On Rough Set
7	Research On Integrated Classification Algorithm Based On Rough Set Attribute Reduction
8	The Research Of Chinese Text Categorization Based On Rough Set In Spam Filtering
9	Study On Chinese Text Classification Algorithm Based On Rough Set And It's Application
10	An Attribute Reduction Algorithm Based On Dynamic Neighborhood Rough Set For Text Classification