Font Size: a A A

Text Feature Selection For Transfer Learning

Posted on:2013-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:W LiFull Text:PDF
GTID:2218330362460700Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet, there are more and more information stored as text on the Web, which become the access to information for people. When people are faced with the network- this huge text library, they need an efficient technology to help people to arrange these texts in the library, and mining on them. Text mining came into being. Text classification is an important technology in text mining, also has a wide range of applications in reality. The two—class text categorization has an important position in text classification. Many practical problems, such as spam filters, delete sensitive information, in essence, are two—class text categorization problem.Besides the huge amount of text, there is another important feature, that is, information on the Internet updates fast. On the Internet, every time there are new content, which is likely to become the focus of people in a short time. In this case, traditional machine learning methods are facing a serious problem, that training data and testing data are no longer subject to the same distribution. Collecte data from the Internet, add label to these data, train a classifier on these data, but when we use the classifier, we found that the data we used is outdated; the classifier has lost its meaning. Transfer learning can effectively solve this problem. Transfer learning does not require training data and test data subject to the same distribution, but trying to use those old data as much as possible, helping build a new classifier which can have a good performance on new data. There are more and more researchers participated in the study of transfer learning.In this paper, I use two—class text categorization problem as background, have done experiment of text classification, which use transfer learning method.Paper discusses the shortage of existing algorithms, which is caused by data skew. Paper also researches how to simultaneously use the old data with new data, to achieve better classification results. In the feature extraction step, paper has improved existing algorithms, proposed a method based on the two steps extraction. After test, it's proved that the improved algorithm effectively improve the classification accuracy and recall.
Keywords/Search Tags:Text classification, Transfer learning, Feature selection, Two—class text categorization
PDF Full Text Request
Related items