Font Size: a A A

The Maximal Information Coefficient Algorithm Research

Posted on:2020-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y X MengFull Text:PDF
GTID:2428330596485810Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The arrival of big data era has promoted the development and progress of science and technology,massive data contains a large amount of unknown information,seeking the correlation between data has become a matter of concern,exploring hidden laws from complicated data requires effective data analysis methods.Therefore,the correlation between analysis and mining data calculation is of great value.The Maximal Information Coefficient(MIC)is an effective statistical correlation evaluation algorithm in recent years.It has excellent Generality and Equitability,The MIC algorithm can mine the potential correlation information in the data set,and can measure the correlation between different types of data.In this paper,we mainly study the MIC algorithm and propose an effective statistical measurement method to detect the correlation between two variables and multi-variables in large-scale data set.The main innovations of this paper are as follows:In order to solve the problem of high computational time complexity of the maximal information coefficient algorithm,an improved algorithm of the dynamic equpartition of maximal information coefficient based on dynamic mean division was presented.The scattered points shown in the grid wereiterated and optimized by using dynamic mean division pairs of variables.The obtained mutual information entropy was regularized to obtain the optimal MIC value,and the multithread computation of the data set was carried out by using the POSIX parallel strategy,which made the computation more efficient in the computation of large data sets.Compared with the existing RapidMIC method on multiple data sets,the DE-MIC algorithm was faster and more efficient under the premise of preserving the universality and uniformity of the original maximum information coefficient algorithm.In view of the fact that the maximal information coefficient(MIC)algorithm is not suitable for detecting the correlation between multivariate variables.Based on the DE-MIC algorithm,this paper proposes a nonlinear maximum information entropy(The Nonlinear Maximal Information Entropy,NMIE)algorithm,to measure the correlation of multivariate variables.Firstly,the multivariate in the data set is merged into two variables and exhausted all;then the correlation relationship between the reduced dimension data sets is evaluated by DE-MIC algorithm;the characteristic matrix is constructed according to the obtained correlation coefficient between any two variables,using the characteristic matrix to calculate the nonlinear maximal information entropy between multiple variables in large-scale data to measure the correlation between multivariables.The numerical experiments verify that the NMIE algorithm has excellent generality and equitability when detecting multivariate correlations,and it is suitable for applications on large-scale datasets.
Keywords/Search Tags:Maximal information coefficient(MIC), Statistical analysis, Dynamic equpartition, Multivariable correlation
PDF Full Text Request
Related items