Font Size: a A A

Text Transfer Learning Algorithm Based On Fuzzy C-means

Posted on:2018-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:H Z TianFull Text:PDF
GTID:2348330533961356Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of the Internet,various types of data,including text,audio and image,increase with an amazing rate.Compared with the audio and image data,text data takes less network resources,has high transmission rate,and is easier to be uploaded and downloaded.Thus This most of the network resources exist in the form of text.Therefore,how to use these resources to construct the model to dig out useful information is a major goal of researchers in recent years.A lot of machine learning methods on text classification are emerging to better help people organize text,mining information.The traditional machine learning methods work under the assumption that the training data and test data are in the same distribution.However,in some real-world applications,training data and test data come from different domains.The traditional learning methods may fail without considering the shift of the data distribution.in recent years,researchers put forward the idea of transfer learning,namely in dealing with the problem of uneven distribution of these data sources,to study the use of similar data,and will learn to transfer knowledge to the target domain.The concept of transfer learning is proposed,which effectively solves the problem of different data distribution in the source domain and the target domain.This paper combines transfer learning and fuzzy theory to study text classification.The main contents are listed as follows:(1)In order to overcome the shortcomings of traditional text classification method,this paper describes in detail about the classification methods of transfer learning and the basic ideas,main problems and defects.(2)This paper proposed a text classification algorithm for transfer learning based on Fuzzy C-Means to solve this problem.First,it classified the test data with a simple classifier.Second,initialized the fuzzy membership degree of each data based on Natural Nearest Neighbor algorithm.Then,updated the fuzzy membership degree based on FCM and refined the labels of test data.Finally,classified the outliers in test data.(3)Considering the influence of different simple classifiers and feature extraction algorithms on the proposed method,we design different experiments on the data set of 20 Newsgroups.At the same time,the proposed algorithm and the traditional SVM and Naive Bayesian classification were compared.The experimental results show that the algorithm has good precision,effectively solves the problems of text classification in the training data and test data distribution inconsistent case.
Keywords/Search Tags:Fuzzy C-Means, Natural Nearest Neighbor, Transfer Learning, Outliers
PDF Full Text Request
Related items