Font Size: a A A

The Theory Of Correspondence Analysis And Its Application In The Simplification Of Sampling Questionnaires

Posted on:2005-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y H LiFull Text:PDF
GTID:2120360125450817Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
In our society, certain departments sometimes needsome information, for example, schools want to know of theideological standards of the students; press needs know of thepopularity of some columns. Such problems usually need to besolved by questionnaire. The designer of the questionnaireattempts to collect questions as many as possible so as to avoidlosing some useful information. In practice, some questions in aquestionnaire probably have some relations with some otherquestions, and can be replaced by the others. On the other hand,the finished questionnaire may be similar due to people'senvironment and the social position. Therefore, how to choosequestions and samples are outstandingly important. What we need to do is to choose the questions that areboth generalized and simplified in order to make the peopleinvestigated willing to answer them. It can be regarded as afeasible way to increase the ratio of receiving answer and thequality of the questionnaire. In the same way, to choose a goodquestionnaire may obtain the same result. In this paper, wesuccessfully solve the problem by means of the method ofvariable selection in correspondence analysis proposed by L.Xia and Y. Yang [8, 12]. Variable (questions) selection and 38individual (questionnaire) selection are a dual concept(according to the theory of the corresponding analysis inmultivariable statistics); therefore this paper intends to beginwith individual (questionnaire) selection. Here, we take X =(xij)n , denoting the matrix of the ×moriginal data as the starting point. Let f ∑x m = ,(i =1,2,L,n) i ij j=1 g j =∑x ,(j =1,2,L,m) ij i=1n f = ( f1, f2,L, fn)T g = (g1, g2,L, gm)T F = diag( f1, f2,L, fn ) G = diag(g1, g2,L, gm) y = (y1, y2,L, yn)T inwhich every component regarded as the scores of individual s. A = (a1,a2,L,am )Tinwhich every component regarded as the scores of variables . Firstly, we calculate the value of the individual oncondition that the weighted square sum yT Fy equals 1, thescores of variables ATGA= yT XG?1XT y reaches its maximum.By means of Lagrange's multiplier rule, we get the question ofthe eigenvectors: X G?1Xy = λFy T 39Thus the eigenvalues and eigenvectors of the scores ofindividuals can be obtained, among which we determine p largepositive eigenvalues 1> λ1 ≥ λ2 ≥L≥ λp > 0 and thecorresponding eigenvectors are y1, y2,Lyp. Thus the scores ofvariables and individuals in group p will be obtained.E = (a1, a2 L, ap)m× p Y = (y1, y2,L, yp)n× p So we get XF?1X y = FED T E = G?1Xy According to the dual rule, Y = F?1XA is obtained. Actually, the number of individuals is much larger thanthat of the variables. In order to reduce the amount of thecalculation, first we calculate the scores of variables, then thatof the individuals. The three criteria for variable selection are specificallyexplained in the third part of this paper, which are derived fromthe following viewpoint: recognize that m individuals can bedescribed quite well by original m variables. The scoresY obtained from m variables approximately describe theconfiguration of n individuals in p dimensional Euclidean space.Our aim is to select l (
Keywords/Search Tags:Correspondence
PDF Full Text Request
Related items