Font Size: a A A

Analysis Methods For Data Including Constant Group

Posted on:2011-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:X H YuanFull Text:PDF
GTID:2154360308970096Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Background:When analyzing the medical measurement data, we should choose appropriate statistical method according to data type. In the process of analyzing data, one would encounter such a data type, that is, the values of one group are constant, and usually the constant is zero, as a result, we view the special group as constant group. For statistical analysis of such data type, in addition to conventional method, we have not seen other treatment. In terms of quantitative data, because of 0 variance of constant group, there is heterogeneity of variance even if by means of variable transformation, therefore, conventional method can only use non-parametric methods. This leads us to another way:if the values of constant group don't change due to sampling, so we can say that the constant group is a known population, then excluding the group in analysis, and we can use the confidence interval approach to compare non-constant groups with constant group. Next we need to verify whether this method is more reasonable and effective than the conventional method.Objective:In terms of data type including one constant group, by Monte Carlo methods to simulate whether the excluding constant group is more reasonable and effective than the including constant group.Methods: 1. The construction of three data types. Quantitative data comes from the normal distribution model, and it should meet the conditions of independence and homogeneity of variance assuming different populations. Qualitative data is from the binomial model. Ordinal data is based on the uniform distribution, then according to corresponding predetermined parameters to generate ordinal data.2. The determination of typeⅠerrorαand power 1-β. K samples come from the same population, analyzing these groups and then calculating the times of P <0.05 in the total simulation times, the result isα. K samples come from at least two (or more) populations, calculating the times of P<0.05 in the total simulation times, the result is 1-β.3. Setting parameters. Quantitative data:first building the constant group with the standard deviation of zero which is defined asμ1= C. In terms of typeⅠerror, according to known parameters and sample sizes of table 1, we simulate three groups normal random variables basing on one population, then calculate the typeⅠerror of excluding constant group and including group, and observe that different values of the constant group impact onα. In terms of 1-β, according to known parameters and sample sizes of table 1, we simulate three groups normal random variables basing on different populations and homogeneity of variance, then calculate the power 1-βof excluding constant group and including group, and observe that different values of the constant group impact on 1-β. Qualitative data:first building the constant group which is defined asπ1=C.In terms of typeⅠerror, according to known parameters and sample sizes of table 2, we simulate three groups binomial random variables basing on one population, then calculate the typeⅠerror of excluding constant group and including group, and observe that different probabilities of the constant group impact onα. In terms of 1-β, according to the parameters and sample sizes of table 2, we simulate three groups binomial random variables basing on different populations, then calculate the 1-βof excluding constant group and including group, and observe that different probabilities of the constant group impact on 1-β. Ordinal data:first building the constant group, we simulate three groups uniformly distributed random variables. Then according to known parameters and sample sizes of table 3 to generate the same distribution ordinal data, and calculate the typeⅠerror of excluding constant group and including group. According to parameters and sample sizes of table 3 to generate the different distribution ordinal data, and compare the 1-βof excluding constant group with including constant group. 4. Examples verification. We select instances from medical papers recently published, then calculate the statistics and P values of excluding constant group and including constant group, and make use of confidence interval approach to compare non-constant groups with constant group when excluding constant group.5. Software and implementation of programming. According to the parameters of different data types, with SAS 9.1.3 rannor function, ranbin function and ranuni function, simulating 1000 times, we can generate normal random variables, binomial random variables and uniformly distributed random variables, and select appropriate statistical methods to analyze these data.Result:In terms of quantitative data, no matter what values of constant group, the type I error of one-way ANOVA for excluding constant group doesn't change with means and standard deviations (sample size is 10,α=0.053; sample size is 30,α=0.065), and the type I error of excluding constant group is less than including constant group, particularly the greater the absolute difference between the means of two approaches. When the means of constant group and non-constant group are equal, the type I error of including constant group remains the same regardless of the change of standard deviation(sample size is 10,α=0.066; sample size is 30,α=0.093). In addition the mean and the sample size of non-constant group are fixed, the greater the standard deviation, the smaller the a of including constant group. If the means of non-constant group areμ2=2,μ3=3,μ4=3, when values of constant group is less than 2 or greater than 3, the 1-βof including constant group (99.0%-100.0%) are greater than excluding constant group (55.3%~93.9%). When values of constant group are in the range of [2,3], the 1-βof including constant group generally greater than excluding constant group. In addition, the greater the absolute difference between the means of two approaches,1 -βof including constant group is greater than excluding constant group.In terms of qualitative data, when the probability of constant group is 0 or 1, the a of excluding constant group(when the probability of non-constant group is 0.3,α=0.046; when the probability of non-constant group is 0.6,α=0.053) is less than including constant group(when the probability of non-constant group is 0.3,α=0.987~1.000; when the probability of non-constant group is 0.3,α=1.000). When the probability of constant group is equal to non-constant group, the a of excluding constant group(when the probability of non-constant group is 0.3,α=0.046;when the probability of non-constant group is 0.6,α=0.053) is greater than including constant group(when the probability of non-constant group is 0.3,α=0.021; when the probability of non-constant group is 0.3,α=0.024), however, the type I error of excluding constant group is closer to significance level 0.05. When the probability of constant group is greater than or less than non-constant group, theαof including constant group generally greater than excluding constant group, particularly the greater the absolute difference between the probabilities of two approaches. When the probability of constant group is 0 or 1, the 1-βof excluding constant group is less than including constant group. When the probability of constant group is outside the range of non-constant group, the 1-βof excluding constant group is less than including constant group, particularly the probability of constant group is beyond the range of non-constant group. When the probability of constant group is inside the range of non-constant group, the 1-βp of excluding constant group is greater than including constant group. When the probability of constant group is equal to the maximum or minimum probability of non-constant group, we can not determine which 1-βis higher, but the difference is not significant.In terms of ordinal data, no matter what values of constant group, the type I error of Kruskall-Wallis test for excluding constant group(sample size is 10,α=0.043; sample size is 30, a=0.049) is less than including constant group(sample size is 10,α=0.108-1.000; sample size is 30,α=0.363~1.000), in particular, the values of constant group are maximum level or minimum level of ordinal data. And no matter what values of constant group, the 1-βof excluding constant group(sample size is 10,1-β=56.3%; sample size is 30,1-β=97.4%) is less than including constant group(sample size is 10, 1-β=60.1%~100.0%; sample size is 30, 1-β=97.9%~100.0%), but values of constant group are middle level of ordinal data, difference of 1-βbetween two approaches is not obvious.Conclusion:Regardless of any data type, compared with excluding constant group, including constant group expands the type I error obviously, particularly, when the constant is 0 (qualitative data also includes 100%), however, there is one exception, when the probability of constant group is equal to non-constant group, the a of excluding constant group is greater than including constant group, but the type I error of excluding constant group is closer to significance level 0.05. In terms of statistical theory, it is inappropriate to include the known constant parameter for the statistical testing. Therefore, when there is constant group in analyzing data, it is appropriate to exclude the constant group. And one can use the confidence interval approach to compare non-constant groups with constant group, also we should adjust the significance level.
Keywords/Search Tags:quantitative data, qualitative data, ordinal data, hypothesis testing, typeⅠerror, power
PDF Full Text Request
Related items