Font Size: a A A

Case-based Domain Adaptation Incremental Learning Methods

Posted on:2018-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z C PanFull Text:PDF
GTID:2358330512476798Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,people can get more and more information on the Internet.The explosive growth of information has advantages and disadvantages.How to make use of information abundantly and efficiently becomes the urgent problem to be solved in academia and industry.Text classification is one of the more commonly used technology in it,which can be divided into domain specific and domain adaptation text classification according to the methods of learning.Nowadays,there are many instance-based domain adaptation algorithms,in which has a common phenomenon:over-learning of instance weighting results in over-fitting problem.To the best of our knowledge,none of the previous work has explicitly discussed the over-fitting of instance weighting in domain adaptation.This paper will focus on this problem.Besides,the traditional statistical machine learning model is usually a single task in Natural Language Processing,and is learned from the training data in a single time,which has its limits in terms of both robustness and scalability.This paper will be incremental improvements for the drawbacks.Firstly,this paper introduces the ILA model,a famous instanced-based domain adaptation algorithm,and then proposes methods based on regularization to enhance it.There are six different sub-methods:three methods based on Early-stopping;two methods of adding penalty factors as ILA model regular items;one method of Dropout Training used to alleviate over-fitting in instance weighting.The results of the text classification task show that these methods can improve the performance of instance-based domain adaptation algorithm to a certain extent,among which Dropout Training achieve the best performance.Secondly,this paper systematically studies the over-fitting problem of instance weighting in domain adaptation.Although the regularization-based methods can alleviate the over-fitting problem to a certain extent,it can't solve the problem fundamentally,and has the limits of efficiency and adaptability.Therefore,this paper proposes a series of penalties based on the penalty function,in which imposes penalty functions of different degrees according to the weights of instances.The experimental results show that the methods based on loss function penalty not only can improve over-fitting problem,but also has strong adaptability and high efficiency,and the penalty function based on a few instances with the large weights are the best one.Finally,this paper proposes an incremental Naive Bayes model based on lifelong learning,which is also based on the traditional Naive Bayes model.Besides,we also give the reasoning process of the parameters in an incremental updating way,and define lifelong learning mechanisms.The model can store the knowledge from the large-scale past tasks,effectively assists the learning of new tasks with small number of labeled samples,and update the past model with the incremental updating of parameters without re-learning the data of past learned.Experimental results on text classification show that our model can incrementally use the knowledge learned from past to guide the task learning,and has better capacity of novel feature processing and domain adaptive.
Keywords/Search Tags:Text classification, Instance Adaptation, Over-fitting, Incremental learning, Naive Bayes, Lifelong learning
PDF Full Text Request
Related items