Infosift: Adapting graph mining techniques for document classification

Posted on:2005-10-11

Degree:M.S

Type:Thesis

University:The University of Texas at Arlington

Candidate:Aery, Manu

Full Text:PDF

GTID:2458390008485521

Subject:Computer Science

Abstract/Summary:

A classification system that determines the patterns of various term associations that emerge from documents of a class, and uses these patterns for classifying similar documents is needed. This thesis proposes a novel graph-based mining approach for document classification. Our approach is based on the premise that representative---common and recurring---structures or patterns can be extracted from a pre-classified document class and the same can be used effectively for classifying incoming documents. To the best of our knowledge, there is no existing work in the area of text, email or web page classification based on pattern inference and the utilization of the learned patterns for classification. A number of factors that influence representative structure extraction and classification are analyzed conceptually and validated experimentally. In our approach, the notion of inexact graph match is leveraged for deriving structures that provide coverage for characterizing the contents of a document class. The ability to classify based on similar and not exact occurrences is singularly important in most classification tasks, as no two samples are exactly the same. Extensive experimentation validates the selection of parameters and the effectiveness of our approach for text, email and web page classification.;The novel idea proposed in the thesis aims at establishing the ground work for adapting graph mining techniques for various classification problems, not necessarily limited to text. (Abstract shortened by UMI.)...

Keywords/Search Tags:

Classification, Document, Graph, Mining, Patterns

Related items

1	Research On Document Classification Method Based On Graph Modal
2	Research On Techniques For Mining Discriminative Subgraph Patterns
3	Mining local and global patterns for complex data classification
4	Research On The Frequent Graph Mining Algorithm For Graph Classification
5	Research On The Frequent Substructure Mining Algorithm For Graph Classification
6	Mining Shared Knowledge\Patterns Between Two Datasets
7	The Research On Key Problems Of Sequential Patterns Mining
8	M-InfoSift: A graph-based approach for multiclass document classification
9	Research On Techniques For Mining Graph Patterns
10	Classification Of Two-stage Approach Based On Eep