Font Size: a A A

Imputation For Missing Value Of Compositional Data Based On Biclustering Algorithm

Posted on:2020-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:P Y XuFull Text:PDF
GTID:2370330572966703Subject:Application probability statistics
Abstract/Summary:PDF Full Text Request
The “missing of data” is a common problem in scientific experiments and investigations.Missing data will affect the quality of statistical data and enhance the complexity of the data analysis process.Therefore,it is of great theoretical and practical significance to fill in missing values.Compositional data is a kind of complex multidimensional data that satisfies special characteristics.It is widely used in many fields such as society,economy,technology,and is used to reflect,for example,industrial structure,consumer structure,and rock mineral composition.Since compositional data is limited by the conditions of “non-negative” and “consistency”,the space formed is a simplex space,so the traditional statistical analysis method for ordinary data in Euclidean space cannot be directly applied to the compositional data.Biclustering algorithm realizes simultaneous clustering of objects and attributes in the data matrix in two dimensions of the row and column.By using objects and attributes to extract their joint information,it effectively digs out local feature information hidden in the data matrix.In this paper,based on the biclustering algorithm,compositional data with lower mean squared residuals in the biclustering cluster has higher consistency characteristics in the clusters to fill the missing values of the compositional data.According to the metric invariance characteristics of the isomtric logratio transformation(ilr),imputation for missing value of compositional data by using isomtric logratio transformation based on biclustering algorithm(ICDBIA)is proposed.Through simulation analysis,and compared with the Aitchison distance-based k-nearest neighbor imputation method(KNN)and iterative regression imputation method(LISR)proposed by Hron et al(2010),results demonstrate that the proposed ICDBIA is effective in imputation for missing values of compositional data.Meanwhile,it provides a new idea for filling missing value of compositional data.Considering that the data characteristics of the compositional data are often neglected in practice,and the compositional data is the ordinary data of the European space,the original unconverted CDBIA imputation method is proposed as a comparison method,and the ICDBIA imputation method is consistently superior the imputation effect of the CDBIA imputation method.Therefore,attention should be paid to the data characteristics of the compositional data in the analysis of the component data.The main content of this paper includes five chapters:Chapter 1: Introduction.This section summarizes the research background,significance,and current research status of compositional data,meanwhile expounds the main contents and innovations.Chapter 2: The theoretical basis.Some theoretical knowledge of the statistical theory of compositional data and the biclustering algorithm are described.Introduce the basics of compositional data including the definition of compositional data,the logratio transformation and compositional data of Aitchison geometry,KNN imputation method and LISR imputation method.Introduction and the basic definition of the biclustering algorithm related definition and biclustering algorithm.Chapter 3: Method of imputation missing values of compositional data.The isomtric logratio transformation of this paper is proposed.The missing value imputation theorem based on the biclustering algorithm is given.Combined with the theorem,the CDBIA and ICDBIA method are proposed,and combine the example to carry on the implementation.Chapter 4: Simulation and empirical analysis.Through simulation and empirical analysis,ICDBIA method proposed in this paper is compared with KNN,LISR,and CDBIA to verify the filling effect of missing values.Chapter 5: Summary and Outlook.Summarize the research in this paper and look forward to the follow-up research.
Keywords/Search Tags:Compositional data, Imputation for missing value, Isomtric logratio transformation, ICDBIA method
PDF Full Text Request
Related items