Font Size: a A A

The Research Of High-dimensional Data Mining Technology For Big Data

Posted on:2014-06-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:W LouFull Text:PDF
GTID:1268330425483465Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
With the upcoming era of Big Data,the traditional way of data processing facesstem challenges. Big Date processing has many new characteristics such as highvolume, variety, velocity and low density of value, which outperforms the traditionalways of searching. The only way to survive is to follow the development of the BigData technologies and turn challenges into opportunities. By better using of Big Data,the important strategic resources, mathematic models and tools can be effectivelyconstructed to change the massive data into real useful information.This paper focuses on the study of high dimensional data mining technologies ofBig Data to meet the requirements of the project. In order to “turn the individual datastudy into data system study” and “turn the passive data verification to active datadiscovery”, the researches and explorations were made in the following respects:(1) In the era of Big Data, the merging of data has becomes extremely importantbecause of too many sources of data. Our researches use the technologies of datapre-processing like data cleaning, integrating, selecting, etc., to maximallyintegrate the data according to one criterion which solves the problems of datamerging andgreatly improve the quality of data mining and minimize the timethat data processing needs.(2) Building up a mathematic model based on the three-dimensional matrix, wecreate an N-dimensional space by defining each attribute of the data as adimension of the space. The attributes are expressed in terms of vectors and thentransformed to matrix. Each information recorded is expressed by a M×N matrix.A series of matrix can express all the records by a three-dimensional matrixwhich is the basis of the following algorithm.(3) Applying the bionic optimization algorithm to the association rules analysis ofhigh dimensional data. The problems like premature convergence and low convergence rate in later period often arise when the traditional genetic algorithmis applied to high dimensional data. We use a coevolution algorithm andintroduce the information exchanging mechanism to make two species evolved ina coordinating way so as to make up the deficiency of the traditional geneticalgorithm. The experiments have proved that the coevolution algorithm caneffectively avoid some problems like premature convergence under theprecondition of the acceptable time complexity. The algorithm is an all-roundoptimization and the relevance rule extracted is more effective when it is appliedto analyzing the collection of high dimensional data.(4) Introducing the concepts of hypergraph and system and exploring to construct thehypergraph on the model of three-dimensional matrix. According to thecharacteristics of Big Data, the new hyperedge definition method is adoptedcombining the concept of system, thus improving the processing capacity. In theassembled analysis based on undirected hypergraph, we use HMETIS,hypergraph partitioning algorithm, to cluster and practice the clustering analysisin high quality. In the detection of the relevance rule redundancy and loop circuitbased on directed hypergraph, we turn the relevance rule into directed hypergraphand redefine the adjoining matrix, which means turning the detection ofredundancy and loop circuit into the handling of connection block and ring in thehypergraph. It provides a new mentality and method for the redundancy disposalof relevance rule.(5) Applying the above new method to the practical data processing. The results ofthe experiments testify that high quality knowledge can be discovered by themathematic model of three-dimensional matrix and relevant data miningalgorithm we adopt.
Keywords/Search Tags:Big Data, high dimensional data mining, three-dimensional matrix, coevolution, hypergraph
PDF Full Text Request
Related items