Stretching And Expanding Of Non-uniform Distribution Data

Posted on:2017-01-07

Degree:Master

Type:Thesis

Country:China

Candidate:J Li

Full Text:PDF

GTID:2308330485978310

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Along with the development of the mass data processing technology, the data-mining algorithm to train the data scale presents geometric type growth, in order to reduce calculation difficulty, more data-mining algorithm in solving optimization problems when using iterative method. Is the data input and iteration search step reasonable or not will seriously affect the efficiency and precision of iteration solution. Unreasonable data input and search step length makes the iterative easy convergence to local optimal solution, and slow down the speed of convergence. Reduce the reliability of the mining result.In the actual calculation, it is hard to find a more scientific search step length. Using the data normalization method or some metric learning method to get the reasonable input of data, is the effective measure to improve the iterative solution. However, the data normalization method and metric learning method cannot effectively reduce the concentration of non-uniform distribution of the data. In the data concentrated space, the distance of data points is small, the iterative approach is difficult to distinguish between data accurately, at the same time small error of the classification can bring a lot of error identification. Based on above, this paper combines some research of the traditional data normalization and metric learning, study of the method to expand the distance between data points which in concentrated space.The study is mainly about stretching and expanding non-uniform distribution of data, expanding the distance between data points which in concentrated space. Improve "resolution" of concentration space. Main work includes:(1) Propose a nonlinear stretching method of non-uniform distribution data, the method in the form of one-dimensional statistics estimate concentration distribution, according to estimate result fitting the nonlinear data normalization function which can nonlinear stretching of distance between data points. (2) Propose a data expanding method based on K-means. This method using the K-means to find out the data concentrate area in Euclidean space, move the data points to the data sparse space by minimum the variance of distance, expand the distance between data points which in concentrate space.In order to verify the effect of the methods, we conduct a series experiments based on a number of different types of UCI data sets for research, some classic data normalization and metric learning were compared. Results show that, in dealing with the data distribution were concentration, both of them can effectively improve the efficiency and precision of iteration solution. Data nonlinear stretching method can make data form a relatively uniform distribution and improve the mining results under the condition of low data dimension and the low correlation dimension, the data expanding method based on K-means is more suitable under the condition of higher dimension and correlation.

Keywords/Search Tags:

Non-uniform distribution, Data concentration, Data Normalization, Metric Learning

PDF Full Text Request

Related items

1	Traffic Aware Dynamic Resource Management And Optimization In Wireless Networks
2	Design And Implementing Of Press Distribution System Based On Provinces Data Concentration Mode
3	Research On Unsupervised Person Re-Idenfitication Based On Deep Asymmetric Metric Learning
4	Research On Data Acquisition System Of Non-uniform Tactile Sensor Array
5	The Study On Molecular Substructure Prediction Based On Metric Learning
6	Research On Metric Learning Based Clustering Method With Incomplete Data
7	Research And Implementation Of Small Sample Image Classification Algorithm Based On Metric Learning And Data Enhancemen
8	Research On Metric Learning And Data Balance In Person Re-Identification
9	Research And Application Of Medical Data Mining Based On Distance Metric Learning
10	Research And Application On Supervised Similarity Metric Learning Approaches