Font Size: a A A

Non-parametric density estimation of streaming data using orthogonal series

Posted on:2006-06-13Degree:Ph.DType:Dissertation
University:George Mason UniversityCandidate:Caudle, Kyle AFull Text:PDF
GTID:1458390008452015Subject:Statistics
Abstract/Summary:
Computer technology in the 21st century has allowed us to gather and collect data at rates that would have seemed impossible less than a decade ago. As such, typical data base management systems (DBMS) are having great difficulty storing and analyzing data in the traditional way. Systems that receive large amounts of data in transient data streams generally need to analyze the data immediately without storing it on a disk. These systems are referred to as data stream management systems (DSMS). This emerging field has been pushed to the forefront by technology that demands analysis of data in real time. Babcock et al. [2002] analyzed the issues involved in mining rapid time-varying data streams. To date, most of the work in the area of DSMS has primarily been concerned with querying the data streams. These queries provide estimates of parameters, such as the mean, and then continuously update them as more data arrives. Recently, Heinz and Seeger [2004] used data streams to provide an estimate of the underlying probability density function by dividing the data up into bins or windows containing the most recent data. An estimate of the density is then created by using the standard wavelet cascading algorithm on the binned data.; This dissertation will provide an alternative approach to finding the probability density function of streaming data. This approach provides an estimate of the density by using an orthogonal series. Obtaining a density estimate by orthogonal series has several advantages which will be discussed throughout this dissertation. Although the approach is applicable to a myriad of basis functions, the density estimation problem will be studied by using wavelets as the basis functions. The history of wavelets as a mathematical tool dates back to the early 1900s. In the 1990s, Donoho and Johnstone [1992,1994] really established wavelets as a scientific discipline by applying them in the areas of image compression, denoising and density estimation. Devroye [1985], Silverman [1986] and Scott [1992] provide excellent background material on density estimation in general. The first paper that used wavelets in density estimation is attributed to Doukhan and Leon [1990]. This work was followed by Walter [1990] and Kerkyacharian and Picard [1992]. As a mathematical tool for representing functions, and specifically probability densities, wavelets work especially well. This is due in part, to the fact that they form an orthonormal basis for L2R . Another pioneer in the field of wavelet density estimation was Vidakovic [1994], who constructed density estimations based on the square root of the density.; This dissertation will first provide a history of wavelets and the density estimation problem in Chapter 2. Next, in Chapter 3, the framework for obtaining a density estimate of streaming data using orthogonal series will be established. In Chapter 4, I will address the problem of discounting old data that is no longer relevant to the density estimate. Chapter 5 provides a simulation study first using simulated data, and then actual data from a case study using Internet header traffic data. Chapter 6 will summarize my findings as well as address possible areas of future study.
Keywords/Search Tags:Data, Density, Using, Orthogonal series, Chapter
Related items