Font Size: a A A

Compressibility Research On High Way Contingency Table Of The Categorical Data

Posted on:2015-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:H Y SunFull Text:PDF
GTID:2267330428461620Subject:Statistics
Abstract/Summary:PDF Full Text Request
Categorical data analysis is an important tool for data analysis of nominal data and ordered data. In categorical data analysis, contingency table analysis of data is a common and intuitive method. For example, medical researchers categorize cases by age and gender to establish contingency table; education workers classify students by age, gender and family background to establish contingency table; economic researchers classify the success or failure of the enterprises by industry, area and initial investment to establish contingency table; market researchers categorize cases according to age, gender and the tendency of commodity consumption to establish contingency table, etc.The traditional categorical data analysis method is mainly for the independence test of contingency table. As the log-linear model is put forward and widely used, categorical data analysis method is often used to analyze the high way contingency table, but the literature in domestic and overseas lack of detailed analysis of the high way contingency table. Because of the complexity of the high way contingency table data, we wish to find some ways to reduce the dimension of the contingency table in order to better analyze the correlation of variables, that is to compress variables of the contingency table. But unreasonable compression can lead to the Simpson Paradox, spurious correlation and spurious independence, this will increase the difficulty of contingency table analysis. So the study on compressibility method of contingency table is very important. At present, scholars in domestic and overseas have some research on the compressibility of three way contingency table, but lack of the compressibility of high way contingency table.This paper will discuss the compressibility of high way contingency table based on the interaction, mutual information and information entropy. Research findings:1. three way contingency table can be compressed as long as there are conditional independence between the variables, but for the four way contingency table conditional independence between the variables is not guarantee that the contingency table can be compressed;2. There are equivalent conditions between the log-linear model based on the interaction and the linear information model based on mutual information, so the analysis results of the two model can be used with each other;3. The model selection methods for linear information model setting condition variable and linear information model not setting condition variable are proposed, find that the fitting results of linear information model is more concise than the linear information model. For the same example, the results of log-linear model show that the data is incompressible, but the results of linear information model show that the data can be compressed;4. The compressible methods of contingency table based on mutual information and information entropy are proposed, find that the compressibility method based on mutual information considers the variables’correlation, in the process of compression allows loss of no significant relevant information, while the compressibility method based on information entropy considers the uncertain information of the variables, in the process of compression does not allow any information loss;5. Two kinds of methods rank the variable importance based on mutual information and the information entropy are proposed, find that the method based on mutual information is more accurate from the angle of contingency table compressibility, but the method based on information entropy is more accurate from the angle of the uncertain information of the variables.The research offers some ideas for the compressibility of high way contingency table and has certain contribution to the development of categorical data analysis.
Keywords/Search Tags:The compression of contingency table, Simpson’s paradox, Interaction, Mutual information, Information entropy
PDF Full Text Request
Related items