Font Size: a A A

Development Of The General Module Of The System Of Quality Of Life Instuments For Cancer Patients (V2.0) And Estimation Of Its Minimal Clinically Important Difference

Posted on:2016-03-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z YangFull Text:PDF
GTID:1224330482451545Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
[Background]With the changing economic and transforming of medical models, the quality of life (QOL), which is an integrated and ’patient-oriented’ indicator, has been pay more attention in cancer treatments and decision making for selection of treatments. And also the research on QOL in cancer field has been gaining increasing attention, and thus becoming the main stream and hotspots.The critical issue that assessment the quality of life is to get a suitable instrument. Under the hard work of the many researchers, there are varied kinds of health related QOL measuring scales for cancer at present time. However, the available specific instruments for different types of cancer have several problems:(1) Available instruments have been developed by different research groups, leading to lack of systematics and agreements. (2)Measuring scales developed abroad are lacking Chinese cultural backgrounds considering their original use in English-spoken patients. For example, Taoism and traditional medicine focus on good temper and high spirit, Good appetite and sleep are highly regarded in daily life with food culture being very important. This kind of culture dependence does not reflect in most QOL instruments in other languages. It is necessary to develop Chinese specific QOL instruments. (3) Most instruments have been developed based on classical test theory, and classical test theory have many shortcomings although it is easy to understand and to practice:① the statistics are dependent on samples with the big sampling errors, ②the ability and the difficulty parameters are not at the same scale, resulting in no matching, ③ test scores are dependent on different tests so that it is difficult to compare scores, ④ it is difficult to achieve of test parallel assumption and guarantee of the validity of the test results extension.In order to solve these problems, our team has been devoted into developments of the Chinese QOL instruments system called QLICP (Quality of Life Instruments for Cancer Patients). The first version of this system QLICPs (V1.0) includes a general module (QLICP-GM), which can be used with all kinds of cancer patients, and six modules for six different cancers includinglung cancer (QLICP-LU), breast cancer (QLICP-BR), head and neck cancer (QLICP-HN), colorectal cancer (QLICP-CR, gastriccancer (QLICP-GA), cervical cancer (QLICP-CE),with each module being used for only the relevant disease. These instruments have been found some problems in clinical use. For example, the construct should be further refined and some items should be modified. Therefore, it is necessary to develop the new versions of QLICPs. Besides, it is also need to develop some instruments for other cancers such as liver cancer, ovarian cancer and leukemia. Moreover, although using some advanced statistics such as structural equation modeling, our development methodology is mainly based on classical test theory (CTT) which is simple and easy to understand. Due to some obvious defects of CTT, two important modern test theories of Item response theory (IRT) and generalizability theory (GT) have been developed and used widely in educational and psychological tests. Considering the numerous advantages of the two modern test theories and their application potential in development of quality of life scale, it is necessary to adopt and use these two important modern test theoriesto guide development of our new versions of QLICPs.Another key issue in measurements andassessments of QOL in medical field is how to interpret the scores of the scale.Many researchers have contributed to it and resulted in computation of Minimal Clinical Important Difference (MCID). MCID, also called clinically significant change, clinically meaningful changes, clinically significant changes, is a differencescore in QOL that is large enough to have an implicationfor the patient’s treatment or care. With MICD, it is easy to get appropriate strategies for interpretingchanges in QOL measures and understand meanings of scores during clinical trials. Therefore, to make out a MICD for the specific QOL instrument, can not only speed up QOL application, but also present useful methodology for making out MCID for many other instruments used in medical fields.To sum up, under supports of the national natural science foundation of China,we have started to develop the second version of the system of quality of life instruments for cancer patientsQLICPs(V2.0) with the general module QLICP-GM (V2.0) being core instrument. This research plans to adopt the combination of IRT and GT with CTT to analyze and evaluate QLICP-GM (V2.0), and also to estimate the Minimal Clinical Important Difference of its score.[Aims]1. To try to adopt the two modern test theory methods (IRT and GT) and also CTT todevelop the General Module of Quality of Life Instruments for Cancer PatientQLICP-GM (V2.0), which is the solid foundation to the whole system of QLICPs.2. To combine IRT and GT with CTTto evaluate the psychometrics of the QLICP-GM (reliability, validity, responsiveness etc.) in the clinical field tests in large samples.3. To Compare IRT, GT and CTT, and find out their respective advantages and disadvantages in development and validation of the general module of Quality of Life Instruments for Cancer Patients, so as to provide scientific methodological references for further researcheson developments and generality scales for other types of diseases.4. To Estimate the Minimal Clinical Important Difference (MCID) of the QLICP-GM.[Methods]1. Subjects.Those with six kinds of cancers including bladder cancer, brain cancer, Esophageal cancer, liver cancer, Leukemia and nasopharyngeal carcinoma who were being treated as inpatients in Yunnan Tumor Hospital and the affiliated hospital of Guangdong Medical college during 2009-2014 were recruited for the study. The inclusion criteria were patients diagnosed with these cancers and had the ability to read, understand, and answer questionnaires. Those who could not read due to low literacy or could not fill out the questionnaires due to deterioration caused by the disease were excluded.2. Development and Validation of the Scale.Two working groups, the nominal group of and the focus group, were organized. The programmed decision method including nominal group and focus group discussions, in-depth interviews to patients and clinicians, pilot tests, and pre-tests was used in item selection. First, the focus group discussed and confirmed the structure of the instrument, which included 4 domains:physical, psychological, social, and the common symptoms/side-effects. After reviewing some well-known QOL instruments such as SF-36, FACT-G (Functional Assessment of Cancer Therapy-General module), QLQ-C30, and considering elements of Chinese culture, the nominal group proposed some possible items under each of the facets within the domains, resulting in a item pool. Then some methods such as focus group discussion, in-depth interview to patients/professionals and pilot-test were used to refine and select items.During the in-depth interviewand importance scoring, the new items can be added if patients/professionals think they are very important.In this way the refined items formed the preliminary scale and then tested in practice for pre-test.Four classically statistical procedures (Variation analysis, Correlation analysis, Factor analysis and Cronbach a) and also IRT method were used to re-screen the items based on pre-test data.Finally, the items selected were kept to form the QLICP-GM (V2.0).The formal QLICP-GM (V2.0)was used to evaluate patients with above cancers in a great scale in order to study its validity, reliability and responsiveness. The participating investigators were doctors, nurses and medical postgraduate students. The investigators explained the aims of the trial and the instrument to the patients and obtained informed consent from those patients who agreed to participate in the study and met the inclusion criteria.Participants were asked to fill in questionnaires at the time of admission to the hospitals by themselves. Each patient was assessed a second time 1-2 days after hospitalization so that the test-retest reliability could be calculated. All patients available at the third scheduled assessment time-point completed the measures at discharge to evaluate responsiveness. Answers were checked immediately each time by the investigators in order to ensure its integrality. If missing values were found, the questionnaire would be returned to the patients to fill in the missing item.The Chinese version of QLQ-C30 was used simultaneously in order to evaluate the criterion-related validity because of a lack of a good standard.After investigation, the raw scores of items, domains/facets and overall scale were calculated. Each item of QLICP-GM is rated in a five-level scoring system, namely, not at all, a little bit, somewhat, quite a bit, and very much. The positively stated items directly obtain scores from 1 to 5 points and the negatively stated items are reversed. Each domain score is obtained by adding its own item score together. The overall scale score is the sum of five domains score. For comparison, all domain/facets scores were linearly converted to a 0-100 scale using the formula: SS=(RS-Min) ×100/R, where SS, RS, Min and R represent the standardized score, raw score, minimum score, and range of scores, respectively.Finally, the reliability, validity and responsiveness were evaluated by the relevant statistical analysis and the feasibility of the scale was also evaluated.3. Item Response Theory.In IRT, item characteristic function(ICF) and item characteristic curve(ICC) were used to depict the item deeply,and information function was used to reflect the measuring errors/reliability. Considering all items being five-level scoring scale of not at all, a little bit, somewhat, quite a bit, and very much, the Semejima Graded Response Model(two parameter Logistic model) was used to analyze each item for each domain.To calculate the difficulty and discrimination for each item, and draw its probability function curve and item characteristic curve. And then items were evaluated and selected and also the marginal reliability was computed.4. Generalizability Theory.Besides classical test theory analysis above, we also applied Generalizability Theory to investigate the score dependability of the QLICP-GM.G theory addresses the dependability of measurements and allows for the simultaneous estimation of multiple sources of varianceincluding interactions by G studies and D studies. A G study quantifies the amount of variance associated with the different facets (factors) that are being examined. A D study provides information about which protocols are optimal for a particular measurement situation by generating Generalizability (G) coefficients that can be interpreted as reliability coefficients across various facets of the study.In our research, G-Studies and D-Studies were performed to estimate the variance components and dependability coefficients in one facet person-by-item design (p×I design). We defined the quality of life of patients as the target of measurement and items as one facet of measurement error. Given every person is asked to reply to all items, the design is One-facet Crossed Design. For the G-Study, a universe of admissible observations, which consists of the object of measurement and the measurement error facets, was defined and the variance components were estimated for the domains of physiological, psychological and social functionrespectively. For the D-Study, a universe of admissible generalizations, which represents the measurement conditions based on the object of measurement and the measurement facets a researcher is willing to generalize over, was defined and the variance components associated with the universe of admissible generalizations were estimated, and the G-coefficients and φ coefficients for each were computed.5. Estimation of the MCID.The minimal clinically importance difference (MCID) for QLICP-GMwas computed by the combinations of distribution-based methods with anchor-based methods. For anchor-based methods, the item Q29 (How would you rate your overall health during the past week?)and Q30 (How would you rate your overall quality of life during the past week?)of the instrument EORTC QLQ-C30 were used as the anchor respectively.The patients who have at least one level difference for the item score of above questions between pre-treatments and post-treatments were seleted to compute the MCID. The score differences of these two itmes (mean for normal distribution or median for non-normal distribution) after treatments were established to MCIDs respectivley.For distribution-based methods, effect size (ES) and standardized error of measurements (SEM) were used to compute the MCIDs. The score changes at ES=0.5 were allocated to the MCIDs.6. Statistical software.To use database software Excel and Foxpro to manage data, and use statistical analysis software SPSS19.0, AMOS19.0and MULTILOG7.0 to conduct statistical analysis for the data.[Results]1. Results of items selection.Based on the data from 190 cases of pre-test sample, and four classically statistical procedures (Variation analysis, Correlation analysis, Factor analysis and Cronbach a) and IRT, and also two focus group discussions,32 items were selected to form the QLICP-GM (V2.0) from the 41 items of the preliminary scale, which included 4 domains and 10 facets.8 items were deleted and 2 items were combined.2. Psychometrics based on CTT.The total formal test sample included 711 patients with six cancers.(1)Reliability.The test-retest correlation coefficients (r) and also intra-class correlation (ICC)for the all domains/facets of QLICP-GM were higher than 0.8, exception of the domain of common side-effect and symptoms (0.66-0.71), demonstrating good test-retest reliability. The Cronbach’s a coefficients of all domains and facets were higher than 0.6. However, at domains level, they were higher than 0.7 exception of social domain.(2)Validity.① According to WHO’sdefinition of QOL and the programmed decision procedures, we developed the QLICP-GMfor patients by use of multiple turns of focus group discussion, in-depth interview and pre-testing to effectively reduce the number of items in the final version to 32 items, which ensured good content validity. ②The correlational analyses on domains-items showed that there were strong correlations between items and their own domains for 4 domains of physical, psychological, social and common symptom/side-effects, suggesting item convergent validity. On the other hand there were weak correlations between items and other domains, demonstrating discriminant validity.Meanwhile, Structural Equation Modeling analysis showed that the structure of the QLICP-GM can be grouped into 4 domains and 10 facets with Goodness of Fit being acceptable, generally reflecting conceptual framework. However, the Goodness of Fit was not good enough. ③Correlation coefficients of scores among the domains of the QLICP-GM and QLQ-C30 showedthat overall the correlations between the same and similar domains are higher than those between different and non-similar domains, implying good criterion-related validity.(3)Responsiveness. Classical Paired t-test was employed in this study to make mean-comparisons between the pre-treatment and post-treatment assessment, also accompanying with an important responsiveness indicator, SRM, with values of 0.20, 0.50 and 0.80 having been proposed to represent small, moderate and large responsiveness, respectively. The results showed that responsiveness indicators were different across cancers. Regard to bladder cancer, not only the overall scale but also all domains showed moderate to large responsiveness. In brain cancer, the domain of physical and common side-effects/symptoms had moderate responsiveness whilethe domain of physical and social function had moderate to large responsiveness in liver cancer. However, the responsiveness in esophageal cancer, leukemia and nasopharyngeal carcinoma were not ideal for possibly different treatments having different effects.3. Outcomes based on GT.The G-study showed thatfor the four domains of the QLICP-GM, the largest source of variationwas due to person-by-item interactions (63.49%-69.99%), while the second variances accounted for10.00%-26.22%by person,and the smallest accounted for6.02%-22.97%by item. Considering the measurement objective is persons QOL, it is appropriate for the proportion ofvariation due to person and person-by-item interactionsaccounted for more than 80%. However, for social domain, theproportion ofvariation due toitems was higher than 20%, implying the domain’s items need improvement.The D-study showed thatfor the three domains of the QLICP-GM exception of social domain, the estimated G-coefficients (Eρ2) and indexes of dependability (φ) were higher than 0.70 for the current design, which can be considered reasonable and agreeable with CTT results. Moreover, the G- coefficients (Ep) and indexes of dependability (φ) increase as the numbers of items increasing, which present the foundations to decide the suitable numbers of items for each domain.4. Outcomes based on IRT.In this research, IRT analysis was conducted in fourdomains:physical function, psychological function, social function andcommon symptoms/side-effects respectively for the conceptual framework including these four domains, basically meeting the one-dimensional requirement which is the key assumption for IRT.According to the relevant literatures, the items with the average information quantity more than 0.74 will be excellent, while the items with the average information quantity from 0.47 to 0.74 are expected to be good, and lower than 0.47 are recognized bad. Based on the information quantity and characteristics of each item, the items were evaluated. The IRT results showed that overall the parameters of difficulty and discrimination for each item of the QLICP-GM were reasonable with the discrimination ranging from 1.08 to 1.43, and the difficulty ranging from -5.98 to 2.90, each item appearing one-way increasing from level 1-4 without reverse threshold value exception of GSO7.On regard to item information quantity, the average information quantity for items of physical domain and common symptoms/side-effects domain was excellent or good generally, while it was good enough for the items of domains of psychological and social function. Additional studies for larger sample size are needed toinvestigate the item information quantity further for IRT analysis need larger sample size to get reliable results usually. However, some items with low information quantity overall had higher information quantity at some places of θ, for example GPH6 and GPH7 at θ=0, demonstrating specific values for these items.5. Estimation of MCIDs.For anchor-based methods, the item Q29 and Q30 of the instrument EORTC QLQ-C30 were used as the anchor respectively. Based on Q29 and Q30 and the principle of at least one level difference, the minimal clinical important difference (MCIDs) for four domains of physical function, psychological function, social function and common symptoms/side-effects and total score were 9.38、8.33、6.25、 7.14,7.03 respectively.For distribution-based methods, the MCIDs for four domains above and total score were 8、8.00、7.09、8.99,6.12 respectively based on effect size (ES), and were 4.37、4.36、5.11、10.03,3.76 respectively based on standardized error of measurements (SEM)Compared with that of anchor-based methods, it can be seen that the MCIDs based on ES were similar, but the MCIDs based on SEM tent to be smaller.[Conclusions]1. The QLICP-GM(V2.0) was developed and validated by the methodology in developing instruments, and the psychometrics were evaluated with results showing good validity, reliability and responsiveness on the whole.2. Both the IRT and GT analysis can be better fitted and applied into the development and validation of the general module of Quality of Life Instruments for Cancer Patients, QLICP-GM, with the results being similar to that based on CTT. IRT and GT are able to comprehensively evaluate the QLICP-GM with great development potential and good application prospects. Because of each theory having its own advantages and disadvantages, therefore, the combined methods based on three theories together are more expected for development of the scale with better outcomes.3. The minimal clinical important difference (MCIDs) based on ES were closed to that based on anchor-based methods with Q29 or Q30 as the anchor. It is suggested that the MCIDs of the QLICP-GM were estimated by anchor-based methods with Q29 or Q30 as the anchor.
Keywords/Search Tags:Cancer, Quality of Life, General Module, Scale, Item Response Theory, Generalizability Theory, Minimal Clinical Important Difference
PDF Full Text Request
Related items