Font Size: a A A

Research On Open Learning Algorithms Over Data Streams

Posted on:2019-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:Q L DengFull Text:PDF
GTID:2428330545477790Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the improvement of computing power and more and more new algorithm-s,machine learning technology has been widely used in real life.Compared to the laboratory environment,we still face many problems to be solved when applying the machine learning algorithm to the real scene.For example,for classification tasks,s-tandard classification models are based on the assumption that all the categories of test samples have been observed during training,i.e.they have the so-called "closed" prop-erty.However,it is more realistic that the data sets we collect are often dynamic.We may constantly discover new classes of samples and need to add them to the existing data sets,which leads to the fact that the training sample set may not cover all category information,i.e.the real application scene often has an "open" properly.Therefore,when applying the standard model to a real scene,it is likely that there will be some class samples that have never been encountered during training.This requires that the classification model has the ability to handle various unknown new categories.In addition,with the development of mobile Internet and social media,more and more available data have been collected,and a large number of real-time streaming data have emerged,characterized by large amounts of data,rapid data generation,and potential distribution that may change over time.The learning of such streaming data has also become an urgent problem to be solved.However,because of the charac-teristics of streaming data itself,many traditional off-line learning algorithms are no longer applicable.For the "open" learning problem in the data stream scenario,this paper proposes two open learning algorithms under different settings by analyzing and studying the competitive learning process.They can be trained on streaming data sets,and at the time of testing,the corresponding classification model can handle unknown class samples that may not be covered during training.we compare the "open" learning algorithm proposed in this paper with the existing algorithms on the artificial data sets and real data sets,and verified the effectiveness of the algorithm.The main work of this article includes:·The reasons why we need the open learning algorithm are described.Then the existing learning algorithms in the open scene are summarized,and their advan-tages,disadvantages and application scenarios are also analyzed.·An open learning algorithm based on one-class learning is proposed.The known class of interest is regarded as the target class.All other classes(including known and unknown)are considered as the non-target class.Then,a one-class model is trained on the sample data stream of the target class to determine if the test sample is of interest.·An open learning algorithm based on distribution learning is proposed.First-ly,the distribution model of the data stream is obtained through unsupervised learning.Then,the label information of the sample is used to distinguish each distribution area.Finally,we obtained an open learning model for data streams that can decide which known class the test sample belongs to or belongs to an unknown class.
Keywords/Search Tags:Machine Learning, Data Streams, Online Learning, Density Estimation, One Class Learning, Semi-Supervised Learning, Open Learning
PDF Full Text Request
Related items