Font Size: a A A

Mathematical quantization for massive data sets

Posted on:2002-09-12Degree:Ph.DType:Dissertation
University:George Mason UniversityCandidate:Khumbah, Nkem-Amin NjuFull Text:PDF
GTID:1468390014450147Subject:Statistics
Abstract/Summary:
Technology has created a tremendous capability for researchers to collect massive amounts of data. This has not been paralleled by accompanying developments of analytical and computational tools for processing the data. As a consequence, traditional data processing tools become practically infeasible when faced with massive data, a problem that is only exacerbated by real-time data processing demands. Motivated by successes of data reduction techniques in engineering, this dissertation introduces and investigates mathematical quantization as a means of compressing massive and larger sized data sets to statistically "analyzable" sizes within presently available data processing constraints and statistical methodology.; In the first part, the structure of the data quantizer is developed. It admits an exhaustive partition of the data support into congruent Voronoi tiles distributed on a lattice, without a priori knowledge of the data distribution. An optimal quantizer depends on the feasible trade-off between its complexity, the amount of data reduction and how well the quantized data represents observed data. The trade-off is dependent on the nature of the data and processing objectives.; In the second part, sigma-algebras of subsets of probability spaces are shown to produce algebras of subsets of quantized probability spaces. Random variables obtained on the quantized data are shown to converge to random variables that would be obtained on recorded data as the partitions of the data support is refined. Specifically, results from some key statistical procedures performed on quantized data are shown to converge to results that would be obtained on recorded data.; In the third part, the distortion resulting from induced data imprecision due to quantization is investigated. The distortion, which is proportional to the amount of data reduction, is minimized by tiles that obtain minimum moments as generated by the operational semi-norm.
Keywords/Search Tags:Massive, Data sets, Mathematical quantization, Data reduction, Quantized data are shown
Related items