Font Size: a A A

Research On Improving Kernel Density Estimator

Posted on:2021-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:J JiangFull Text:PDF
GTID:2480306200450744Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Probability density function(p.d.f.)estimation is a process using statistical methods to estimate the probability density function of a dataset with unknown distribution.It is a basic research in the fields of machine learning and data mining.Kernel density estimation(KDE)method,also known as Parzen window method,is a commonly used non-parametric probability density estimation method.How to determine the optimal bandwidth is the key issue of KDE.Based on the classic KDE method,this article improves the classic KDE from two aspects:(1)The typical method for determining the optimal bandwidth is to minimize the mean integrated square error(MISE).MISE is a commonly used function to measure the error between the estimated p.d.f.and the true p.d.f..This causes the consequence that an unknown term related to the true p.d.f.is introduced when determining the optimal bandwidth.The classic KDE methods use heuristic strategy to estimate this unknown term,which lead to the occurrence of phenomenon that using the unknown to estimate the unknown,and make the estimated p.d.f.unstable.Inspired by the concept of entropy,this paper proposes a new minimum entropy-based kernel density estimator(ME-KDE).Unlike classic KDEs,which use MISE as the objective function,ME-KDE uses the substitution entropy of the given data set as the objective function,whose advantage is when determining the optimal bandwidth,unknown terms will be no longer introduced,and the stability of estimated p.d.f.will be enhanced.In addition,a new fixed-point iteration algorithm is designed to calculate the optimal bandwidth.Both theoretical analysis and experimental results prove that the ME-KDE model has improved the accuracy and stability of p.d.f.estimation compared to the classic KDE method.(2)When dealing with the p.d.f.estimation problems of stream data or large-scale data,the classic KDE methods have the disadvantages such as too long training time and a waste of computing resources.Inspired by the concept of incremental learning,this paper proposes a new incremental kernel density estimator(I-KDE).When there is newly-coming data,the classic KDE methods always combine the newly-coming data with the original data,and then retrained based on all the data.However,I-KDE uses the estimated p.d.f.based on the newly-coming data to update the estimated p.d.f.based on the original data gradually in the way of data stream computation.As new data arrives,this update process is also dynamic.In order to guarantee the convergence of I-KDE,a new multivariate fixed-point iteration algorithm based on the unbiased cross-validation(UCV)is designed here to determine the optimal bandwidth.After that,we prove the convergence of I-KDE,the convergence of the fixed-point iteration algorithm,and the estimated performance of I-KDE through theoretical analysis and simulation experiments.
Keywords/Search Tags:Probability Density Function, Kernel Density Estimation, Optimal Bandwidth, Substitution Entropy, Incremental Learning
PDF Full Text Request
Related items