Font Size: a A A

Research On Incremental Learning Method Based On Bayes Theory

Posted on:2017-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:N LiFull Text:PDF
GTID:2428330482480962Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
The training of the traditional text classifier needs a large number of labeled corpus sets,these corpus sets are obtained by inefficient manual classification based on knowledge engineering.Incremental learning techniques can enhance the knowledge reserve of classifier by learning the unlabeled corpus set,which can solve the problem that the training corpus set is not marked,large quantity of corpus collection and memory limitation.Due to characteristics of high efficiency,high classification accuracy and take full advantage of prior knowledge and sample information,Naive Bayesian makes itself a natural choice for incremental learning.Based on the 0-1 classification loss,select the complete data add to the original corpus by sequence,but there are some problems:(1)When the new corpus size is too large,the complete data need to be found by iteration and then added to the original corpus,which greatly weaken the incremental learning efficiency.(2)Lack of knowledge reserve in the initial stage of classifier,the incremental learning of the error classification text will easily lead to a sharp deterioration of the performance of the classifier.In view of above question,this paper proposes an incremental learning method of confidence level adjustable and sequence selectable.Based on previous work,the following optimizations and improvements are made:(1)Counting Bloom Filter is introduced to incremental frequency of word statistics and dimension reduction,improve the early performance of the classifier,Counting Bloom Filter,to a certain extent,resolves memory limit,increase the number of text in an incremental learning,and also increases the probability of complete data;(2)Establish dynamic confidence threshold value window,in the initial stage of classifier,the knowledge reserve of classifier is incomplete,raising the confidence level can let complete data join in the original classifier with high probability,In the late stage of classification,relax confidence threshold value,speeds up the efficiency of incremental learning,balance the incremental learning of complete sample data and high efficiency of incremental learning.(3)Sequence selection,select corpus with high category bias,that is complete data.
Keywords/Search Tags:Bayesian classification, incremental learning, confidence level, sequence selectable
PDF Full Text Request
Related items