An examination of KSS for feature selection for text categorization using support vector machines

Posted on:2006-04-08

Degree:M.C.Sc

Type:Thesis

University:Dalhousie University (Canada)

Candidate:Basu, Atreya

Full Text:PDF

GTID:2458390008960904

Subject:Computer Science

Abstract/Summary:

There are currently many tools for information extraction from textual data. Some tools extract data such as keyphrases and keywords while others give information about the importance of the two. These tools are sometimes called knowledge tools. In our experiment we examined what would happen if we employed these knowledge tools to text categorization. We reasoned that by improving the quality of extracted information we would have better document representation and therefore be able to better train our classifier for text categorization. The knowledge tools that we used were provided to us by IBM as a software suite called KSS. The two tools from that package that we used were Textract, a keyphrase/keyword (term) extractor, and IQ an algorithm that measures the importance of a term to a document.;Currently Support Vector Machines (SVMs) are considered excellent classifiers, and therefore we decided to employ a, 'best of breed tools' approach to the problem of Text Categorization.;While using Textract and IQ together we did get better quality terms, as evidenced by almost identical Precision and Recall measures. However the terms extracted and weighted by these tools could not be statistically proven to be better for training the SVM classifiers as compared to terms extracted and weighted by traditional information retrieval techniques. The traditional technique being, only keyword extraction, no keyphrase extraction, and no term ranking and thereby no term filtering.

Keywords/Search Tags:

Text categorization, Tools, Extraction, Information, Term

Related items

1	Research On The Term Weighting Scheme And Text Representation Strategy For Text Categorization
2	The Research And Implementation Of Text Categorization Technology In Integrated Risk Meta-Search Engine
3	A Class Core Extraction Method For Text Categorization
4	Research And Application On Feature Selection Algorithms Based On Term Distributions In Text Categorization
5	Research And Implementation Of Text Categorization System Based On VSM
6	Study On Term Semantic Relationship And Its Application In Text Categorization
7	Study And Design Of Text Information Extraction And Classification System
8	The Research Of Text Representation And Feature Selection In Text Categorization
9	The Research On Several Key Techniques In Text Information Processing
10	Research Of Text Categorization Based On The Theme Mining And Covering Algorithm