Font Size: a A A

Stretching And Expanding Of Non-uniform Distribution Data

Posted on:2017-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2308330485978310Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Along with the development of the mass data processing technology, the data-mining algorithm to train the data scale presents geometric type growth, in order to reduce calculation difficulty, more data-mining algorithm in solving optimization problems when using iterative method. Is the data input and iteration search step reasonable or not will seriously affect the efficiency and precision of iteration solution. Unreasonable data input and search step length makes the iterative easy convergence to local optimal solution, and slow down the speed of convergence. Reduce the reliability of the mining result.In the actual calculation, it is hard to find a more scientific search step length. Using the data normalization method or some metric learning method to get the reasonable input of data, is the effective measure to improve the iterative solution. However, the data normalization method and metric learning method cannot effectively reduce the concentration of non-uniform distribution of the data. In the data concentrated space, the distance of data points is small, the iterative approach is difficult to distinguish between data accurately, at the same time small error of the classification can bring a lot of error identification. Based on above, this paper combines some research of the traditional data normalization and metric learning, study of the method to expand the distance between data points which in concentrated space.The study is mainly about stretching and expanding non-uniform distribution of data, expanding the distance between data points which in concentrated space. Improve "resolution" of concentration space. Main work includes:(1) Propose a nonlinear stretching method of non-uniform distribution data, the method in the form of one-dimensional statistics estimate concentration distribution, according to estimate result fitting the nonlinear data normalization function which can nonlinear stretching of distance between data points. (2) Propose a data expanding method based on K-means. This method using the K-means to find out the data concentrate area in Euclidean space, move the data points to the data sparse space by minimum the variance of distance, expand the distance between data points which in concentrate space.In order to verify the effect of the methods, we conduct a series experiments based on a number of different types of UCI data sets for research, some classic data normalization and metric learning were compared. Results show that, in dealing with the data distribution were concentration, both of them can effectively improve the efficiency and precision of iteration solution. Data nonlinear stretching method can make data form a relatively uniform distribution and improve the mining results under the condition of low data dimension and the low correlation dimension, the data expanding method based on K-means is more suitable under the condition of higher dimension and correlation.
Keywords/Search Tags:Non-uniform distribution, Data concentration, Data Normalization, Metric Learning
PDF Full Text Request
Related items