Text Feature Selection For Transfer Learning

Posted on:2013-01-19

Degree:Master

Type:Thesis

Country:China

Candidate:W Li

Full Text:PDF

GTID:2218330362460700

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of the Internet, there are more and more information stored as text on the Web, which become the access to information for people. When people are faced with the network- this huge text library, they need an efficient technology to help people to arrange these texts in the library, and mining on them. Text mining came into being. Text classification is an important technology in text mining, also has a wide range of applications in reality. The twoâ€”class text categorization has an important position in text classification. Many practical problems, such as spam filters, delete sensitive information, in essence, are twoâ€”class text categorization problem.Besides the huge amount of text, there is another important feature, that is, information on the Internet updates fast. On the Internet, every time there are new content, which is likely to become the focus of people in a short time. In this case, traditional machine learning methods are facing a serious problem, that training data and testing data are no longer subject to the same distribution. Collecte data from the Internet, add label to these data, train a classifier on these data, but when we use the classifier, we found that the data we used is outdated; the classifier has lost its meaning. Transfer learning can effectively solve this problem. Transfer learning does not require training data and test data subject to the same distribution, but trying to use those old data as much as possible, helping build a new classifier which can have a good performance on new data. There are more and more researchers participated in the study of transfer learning.In this paper, I use twoâ€”class text categorization problem as background, have done experiment of text classification, which use transfer learning method.Paper discusses the shortage of existing algorithms, which is caused by data skew. Paper also researches how to simultaneously use the old data with new data, to achieve better classification results. In the feature extraction step, paper has improved existing algorithms, proposed a method based on the two steps extraction. After test, it's proved that the improved algorithm effectively improve the classification accuracy and recall.

Keywords/Search Tags:

Text classification, Transfer learning, Feature selection, Twoâ€”class text categorization

PDF Full Text Request

Related items

1	Theoretical Analysis And Algorithm Study On Feature Selection For Text Categorization
2	Research On Text Categorization Based On LDA And SVM
3	A Study On Text Categorization Based On Machine Learning
4	Research On Feature Selection And Classification Methods For Text Categorization
5	Research On High-Performance Text Categorization
6	Research On High Performance Chinese Text Classification Based On Machine Learning
7	Text Representation And Algorithms For Chinese Text Classification
8	The Research Of Text Representation And Feature Selection In Text Categorization
9	Related Technologies Research On Feature Selection For Text Categorization
10	The Text Classification Improvement Research Of Transductive Transfer Learning Algorithm Based On TrAdaBoost