Research On Feature Description And Classifier Construction Algorithm In Chinese Text Classification

Posted on:2007-12-09

Degree:Master

Type:Thesis

Country:China

Candidate:L Liu

Full Text:PDF

GTID:2178360185974706

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development and spread of Internet, electronic text information increases exponentially. An important research is focused on how to extract knowledge and models from this great number of online documents. As the key technology in organizing and processing large mount of document data, text classification can solve the problem of information disorder to a great extent, and is convenient for user to find the required information quickly. Moreover, text classification has the broad applied future as the technical basis of information filtering, information retrieval, search engine and so on.This paper mainly focuses on the two pivotal problems in text classification tasks, including text feature description method and classifier construction algorithm. The main works of this thesis are as follows:1. A context-based text feature description method is proposed in this paper. Text feature description is considered as the basic problem in text classification and it aims to use computable feature to denote documents. The most used feature description method treats a text as a set of discrete words, which called"bag of words"mode, under this mode feature selection and weighting consider the"frequency"of single word only, ignoring the relation of words in context. But generally words in a certain context field can deliver correlative meaning for a same topic. So the"bag of words"mode loses the context information that main improving classification precision. This paper put forward a new feature description method based on text context. First, employs a commonly used feature selection method to get a initial set of feature words; second, compute the reliance of words in a concrete context by Mutual Information (MI), then, extract words that have high reliance in the same context, and adjust the weight of each feature. The result explained that the new method outperforms traditional methods.2. An algorithm is designed for training text classifier based on SVM active learning Text classification algorithms are supervised which means the classifier training need some human labeled data of fixed classes. Generally, the accuracy of classifier is higher with more labeled data. Actually, most time training set contains a great deal of redundancy data, which can't contribute to the classification accuracy, in the other hand the labeled data by hand are expensive resource. Therefore one vital problem with text classification is how to reduce the number of labeled data while maintain the proper accuracy. This paper presents a new text categorization algorithm for performing active learning with support vector...

Keywords/Search Tags:

Text Classification, Feature Extraction, Machine Learning, Active Learning, Support Vector Machine

PDF Full Text Request

Related items

1	Research On Text Classification Algorithm Based On Support Vector Machine And Neural Network
2	For Surface Classification Support Vector Machine (svm) Active Learning Method Of Study
3	Research Of Automatic Text Classification Method Based On Machine Learning
4	Web Pages Classification Based On Active Learning Support Vector Machine Learning
5	Study On Least Squares Support Vector Machine And Its Applications
6	The Study Of Classification Methods And Its Applications In Web Mining Based On Statistical Learning
7	Research On Text Classification Based-on Support Vector Machine
8	Research And Implementation Of Text Classification Based On Depth Learning Theory And SVM Technology
9	Research On Some Problesm Of Support Vector Machine Learing Algorithm
10	High Dimensional Multispectral Data Classification By Machine Learning