Aim: The aim of this study was to critically appraise the evidence on the measurement properties of the Patient Reported Outcomes Measurement Information System (PROMIS) physical function measures, used in clinical samples of patients with spinal disorders.
Background: The PROMIS system was established to improve quality and comparability of health outcomes measures. For the physical function domain, an item bank was developed which can be administered using short-forms and computerized adaptive test applications. The measurement properties of these physical function measures in clinical settings with patients with spinal disorders have not been evaluated in a systematic way.
Methods: A systematic literature search was performed in the Pubmed, Embase, Cinahl and Pedro databases, to identify studies on the evaluation of measurement properties of PROMIS physical function scales in patients with spinal disorders. Available evidence on reliability, validity and responsiveness of the included measurement instruments was rated according to published quality criteria.
Results: The search identified five studies; one on the PF item bank, three on PF CAT applications and one study on the 10-item PF short form. Internal consistency … was rated as. Structural validity …. Responsiveness…. Floor and ceiling effects… Time for administration of the CAT was …
Conclusions: slnlvknlvkndl
Implications of key findings:
Keywords: PROMIS, physical function, spinal disorders, measurement properties
1. Introduction (max 1000 woorden, huidig … , geen subheadings gebruiken)
Musculoskeletal disorders have a high prevalence and cause a high burden of disability worldwide, accounting for 21,3% of the total Years Lived with Disability(YLD)1. The two conditions with the highest proportion of YLD in 2010, according to the Global Burden of Disease Study 2010, were spinal disorders (SD); low back pain (49,6 %) and neck pain (20,1 %)2.
Physical therapist use several interventions in patients with back pain, like exercise therapy, spinal manipulation or massage, aiming to reduce pain and improve physical function (PF)3. To improve communication and patient management in patients with SD, Patient reported outcomes (PROs) are used by physical therapists and other care givers in clinical practice4. For measuring PF, a wide variety of PROs have been developed for different measurement purposes, e.g. diagnostic, prognostic or evaluative reasons. However, clinicians are challenged to select the right PROs because of the limited evidence for the measurement properties like reliability, validity or responsiveness and because of feasibility aspects5,6.
In 2004, the Patient-Reported Outcomes Measurement Information System (PROMIS) was initiated by the National Institutes of Health (NIH). The goal of the PROMIS initiative is to improve measurement quality and comparability of health outcomes measures and reduce the burden for respondents. This is realized by building and validating item banks for measuring specified symptoms and health status domains7,8. The PROMIS system consists of a collection of item banks. An item bank is a series of questions or items, all measuring the same domain, e.g. pain interference or physical function, independent from disease9. The items in an item bank are all calibrated om the same scale, using Item Response Theory (IRT) modelling. In this way, more precise measuring results can be obtained. Also, IRT based item banks enable the use of Computerized Adaptive Testing (CAT)10. A CAT uses an algorithm that selects items from the item bank, based on the person’s response to the previous question, and the estimated latent trait or domain level for that person. The algorithm stops asking questions when a pre-defined precision is reached, in this way the number of questions can be reduced to 4 to 7 items10.
A preliminary PROMIS item bank for PF was developed using IRT and evaluated through CAT simulation, and showed improved measures for PF and increased efficiency using CAT11. The PROMIS PF item bank was further developed, which resulted in a final bank consisting of 124 items covering central (i.e. spinal) and upper and lower extremity functions and activities of daily living12. Validity of the PF item bank was examined in diverse clinical samples 13. Fixed length short forms and CAT’s for the PROMIS PF item bank have been developed and the measurement properties were examined in several clinical application studies.
However, the evidence on the measurement properties of the PROMIS PF item bank, short forms and CAT applications, used in clinical samples with patients with SD, have not been reviewed in a systematic way.
2. Aim
The aim of this study is to critically appraise the evidence on the measurement properties of the PROMIS physical function item bank, short forms and CAT applications in patients with spinal disorders.
3. Methods (700 woorden, huidig ….)
3.1 Design
The study design is a systematic review of measurement properties. We performed the review according to the protocol for systematic reviews of measurement properties by C.B. Terwee (November 2011), downloaded from http://www.cosmin.nl/downloads.html . In acoordance with this protocol, reporting of the review was conducted following the PRISMA statement14.
3.2 Information sources
We searched the following electronic databases on February 24th, 2017: PubMed, EMBASE, CINAHL and Pedro. Additionally, a hand search was performed on the website www.nihpromis.com/science/PubsDomain/Physical_function.aspx where the scientific publications on PROMIS measurements in the domain of PF are presented.
3.3 Search strategy
The search contains blocks of search terms related to the following aspects: (1) construct of interest (physical functioning): no search terms were included. Instead, studies on the construct physical functioning were selected by hand form the search. (2) target population (spinal disorders): no search terms were included, studies on the target population spinal disorders were also selected by hand form the search. (3) type of instrument: (full item bank, short form or Computerized Adaptive Test using the PROMIS Physical Functioning Item bank): a combination of search terms “promis” and “patient reported outcomes measurement information system” were used. Studies on the PROMIS Physical Function item bank, short forms or CAT were selected by hand from the search (4) measurement properties: In PubMed, a validated sensitive search filter for studies on measurement properties of measurement instruments was used15. A translated version of this filter was used in EMBASE. Another translated version of this filter was used in CINAHL. The full search strategies can be found in Appendix 1.
3.4 Selection criteria
We used the following inclusion criteria: (1) the study population consisted of adult (18 years and older) patients with spinal disorders (back and neck pain), including radicular pain/hernia; (2) the main purpose of the study was the evaluation of measurement properties of the PROMIS PF item bank, PF short form or PF CAT in a clinical sample; (3) the article was an original research report; (4) the article was published in English, German or Dutch; (5) the article was a full-text article published in a Peer reviewed journal. No restrictions concerning the year of publication were used.
Exclusion criteria were: (1) spinal cord injury, spine trauma/fractures, cancer, neurological disorders (e.g. Multiple Sclerosis), Rheumatologic disor
ders, infection and pelvic floor pain; (2) studies on the initial developm
ent of the PF item bank (establishing face and content validity in a general population), other constructs (e.g. pain or pain interference), composite measurement instruments measuring several constructs like the PROMIS-29 and NIH minimal dataset, studies on the Upper extremity and Lower extremity CAT; (3) reviews, intervention studies, case reports, abstracts, editorials and dissertations.
3.5 Study selection
One of the reviewers (EJH) screened the search results on title and abstract to identify potentially relevant articles. For abstracts that fulfilled the inclusion criteria, full-text articles were retrieved. Reference lists of retrieved articles were manually screened to identify additional relevant articles. After reading of the full-text articles, a final decision on the inclusion of articles was made.
3.6 Data extraction
Data extraction was performed by two independent reviewers (EJH, CK), using a standardized extraction form, based on the COSMIN checklist 4-point scale16. Data extracted from the included articles included general characteristics of the instruments, characteristics of the study populations, results of the measurement properties and evidence on the interpretability of the measures.
3.7 Measurement properties
The measurement properties that were assessed were internal consistency, structural validity, hypothesis testing for construct validity and responsiveness. We used the definitions of these measurement properties according to the COSMIN taxonomy17, see table 1.
3.8 Quality assessment of studies
Two reviewers (EJH, CK) independently assessed the methodological quality of the included studies, using the COSMIN checklist 2.0 for PROMS (Mokkink, L. B., de Vet, H. C. W., Prinsen, C. A. C., Patrick, D. L., Alonso, J., Bouter, L. M., & Terwee, C. B. (2017). COSMIN risk of bias checklist for assessing the methodological quality of studies on the measurement properties of Patient-Reported Outcome Measures. Submitted.) This is an updated version of the COSMIN checklist on a 4-point scale16,18. The COSMIN checklist consists of boxes with quality criteria for each measurement property. Each item was rated “excellent”, “good”, “fair” or “poor” and the overall rating per measurement property was determined using the “worse score counts” algorithm. Disagreements between the two reviewers were resolved by consensus.
3.9 Quality assessment of measurement properties
The quality of the measurement properties of the instruments were determined by using the criteria for measurement properties as described by Prinsen et al.19 (modified from Terwee et al.20). Only the measurement properties that were assessed in this review are presented; internal consistency, construct validity and responsiveness. The possible rating for a measurement property is “positive”, “indeterminate” or “negative” (see Table 2).
3.10 Best evidence synthesis
A best evidence synthesis was performed by combining the results of the quality rating of the measurement properties with the methodological quality of the studies. The quality of evidence was determined as described in the consensus based guideline by Prinsen et al.19, see table 3.
4. Results
4.1 Study selection/number of studies screened assessed/included
The search strategy resulted in 664 unique records. After screening on title and abstract, 20 records remained for full-text assessment. Reference checking did not reveal any additional articles. After full-text assessment, 15 articles were excluded, so five articles remained for inclusion in the review (see Figure 1).
4.2 Study characteristics
In the five included studies, three ways of assessing the PROMIS PF domain were evaluated for their measurement properties in patients with spinal disorders: the PROMIS PF item bank21, the PROMIS PF item bank assessed by means of CAT13,22,23 and the PROMIS PF 10-item short form24. The mean age of the study populations varied from 53 to 63 years, and overall slightly more females were included. The disease characteristics of the population were Back and Neck pain21, Back or leg pain13,22, Spine pain or disability23 and Lumbar radiculair pain (where patients were referred for lumbar transforaminal epidural steroid injection)24. All studies were conducted in academic clinical settings in the USA and all measures were performed in English language. The characteristics of the study populations are summarized in table 4.
4.3 Methodological quality
Ratings for methodological quality of the studies are presented in table 5. The PROMIS PF item bank was assessed in one study where a Rasch model (IRT method) was applied21. In this study, internal consistency and structural validity were both rated “excellent”.
The PROMIS PF CAT was assessed in three studies13,22,23. In one study a Rasch model was applied22. In this study, internal consistency and structural validity were both rated “excellent” and hypotheses testing for construct validity was rated “excellent”. In two other studies on the PF CAT, no IRT methods were applied 13,23. In one study only hypothesis testing for construct validity was examined, and was rated “excellent” 23. In one study only responsiveness was examined13 and was rated “fair”, due to limited information on the comparator instruments and due to a lack of clearly stated hypotheses a priori.
The PROMIS PF 10-item short form was assessed in one study, where no IRT methods were applied24. Hypotheses testing for construct validity was rated “fair”, because the PF short form was compared to the (23-point) Roland Morris Disability Index (RMDI), the European Quality of Life scale 5D questionnaire (EQ-5D) and the Numeric Rating Scale for Pain (NRS pain), which we considered not all measuring (solely) the construct of Physical Function. In this study responsiveness was rated “fair”, because the magnitude of expected change (before and after intervention) was not formulated in the hypothesis.
4.4 Results of individual studies (measurement properties)
The measurement properties that were assessed in the included studies are presented in table 6. None of the included studies assessed reliability in terms of test-retest reliability, inter-rater reliability or intra-rater reliability. Also, measurement error in terms of absolute measures were not assessed in the included studies. No studies that assessed content validity were included in our search. Furthermore, in none of the included studies cross-cultural validity or criterion validity were assessed.
Internal consistency. For the PROMIS PF item bank, internal consistency was rated positive in one study21, based on adequate evidence for unidimensionality and a person reliability of 0.99 and item reliability of 1.00. For the PROMIS PF CAT application, internal consistency was rated positive in one study22, based on evidence for unidimensionality and a demonstrated person reliability of 0.95 and item reliability of 0.95.
Structural validity was rated positive in one study on the PROMIS PF item bank, were the unexplained variance was 2.9% indicating unidimensionality in measuring PF21. In one study on the PROMIS CAT, structural validity was also rated positive with unexplained variance in the residuals of the first dimension of 2.6%22.
Hypotheses testing (for construct validity) was rated positive for the PROMIS PF CAT because a correlation was found with the Short Form-36 Physical Function Domain (SF-36PFD) (r = 0.81) and with the Oswestry Disability Ondex (ODI) (r = –
0.81)in one study22 which was in accordance with the hypothesis a priori. Also a
correlation was found with the ODI (r = 0.76 – 0.85) for back pain patients and a strong correlation was found for the Neck Disability Index (NDI) ( r = 0.83 – 0.87) for neck pain patients in one other study23 which was rated positive. For the PROMIS PF 10-item short form an indeterminate rating was given for hypothesis testing for construct validity, because a correlation of 0.7 was found with the Roland Morris Disability Index (RMDI) and 0.5-0.6 with the European Quality of Life Scale 5D Questionnaire (EQ-5D) and a lower correlation (0.35 – 0.50) was found with the Numerical Rating Scale for pain (NRS) pain which measures an unrelated construct24.
Responsiveness for the PROMIS PF CAT was rated indeterminate, because solely a comparison was made with a general health anchor, but no correlation was calculated13. Responsiveness for the PROMIS PF 10-item short form was rated indeterminate, because the changes with RMDI and the EQ-5D at 3 month were correlated, but the changes at 6 months were not correlated and this was not in accordance with the hypotheses a priori24.
4.5 Best Evidence Synthesis
A summary of ratings for methodological quality and measurement properties is presented in table 7. In the best evidence synthesis, we combined the results from the studies on the PROMIS PF item bank, the PROMIS CAT and the PROMIS 10-item short form. The results form the best evidence synthesis are described per measurement property.
Internal consistency
Structural validity
Moderate evidence was found for a positive rating of structural validity, based on consistent findings in one study of good quality21 and one study of fair quality22.
Hypothesis testing
Moderate evidence was found for a positive rating of hypothesis testing, based on consistent findings in two studies of fair quality22,23 and one study of poor quality24.
Responsiveness
Low evidence was found for indeterminate rating of responsiveness based on one study of fair quality13. Very low evidence was found for a positive rating of responsiveness based on one study of poor quality24.
5. Discussion (1200 woorden, geen subheadings, wel verschillende paragrafen)
5.1 Summary of evidence/statement of principal findings
– Geen studies gevonden over (test-hertest) reliability en measurement errors en niet over cross-cultural validation
– Studies alleen nog in USA, in Engels en in Academic settings (geen 1e lijn, geen FT)
5.2 Strengths & limitations
Strengths:
– Protocol Cosmin
– Search 4 databases , geen tijdslimiet
– 2 independent reviewers (search, selectie tiab, selectie full text, quality appraisal)
–
Limitations
– Mogelijk missen van studies door..?
– Beperkt aantal gevonden studies -> conclusies beperkt
– In deze review geen content validity onderzocht
– Mogelijk nadeel van uitsluiten samengestelde meetinstrumenten met PF construct (PROMIS 29 en NIH minimal dataset)
–
In the best evidence synthesis, we combined the results from the studies on the PROMIS PF item bank, the PROMIS CAT and the PROMIS 10-item short form. All studies were conducted measuring the same construct in comparable populations and comparable settings, however, the form of administration of the item bank as a whole, a CAT and a short form are not the same, thus conducting a best evidence synthesis could be argued.
5.3 Link with the literature on the subject, findings in relation to the findings of other studies/reviews
– Nog geen andere reviews over PROMIS
– review Oude Voshaar over PF bij RA
5.4 Relevance for clinical practice
PROMIS is een nieuwe vorm van PROMS met veelbelovende verbeterde meeteigenschappen (althans in ontwikkelings studies). Hierdoor betere metingen in zowel onderzoek als in clinical settings mogelijk. Toepassing van met name CAT apllicaties zorgt voor betere precisie, geen missing items, lage floor/ceiling effects en een lage “burden” door de snelle afname tijd.
Belang van onderzoek in clinical populations onderstrepen.
5.5 Recommendation for future research
PF item bank is in ontwikkeling (nu versie 2.0). Vertalingen en validatiestudies naar ander landen zijn onderweg. Hiervan moeten dan ook de meeteigenschappen nog onderzoecht worden in clinical samples. Voor PF in diverse populaties. Met name CAT toepssingen zijn interessant. Behalve in Academsche settings ook voor de eerstelijns populatie (aangezien Fysiotherapeuten veel gebruik maken van PROMS en hier de patient burden ook erg hoog is).
Belang om aandacht te besteden aan geode beschrijvingen van gebruikte IRT methods, missing items, de juiste comparison measurements (bij construct en responsiveness). Aanbeveling om bij de opzet van studies gebruik te maken van de COSMIN V2.0 checklist en de guideline van Prinsen et al.
6. Conclusion (300 woorden)
Table 1 Overview of measurement properties and definitions
Measurement property
Definition according to the COSMIN taxonomy17
Internal consistency
The degree of interrelatedness among the items
Structural validity
The degree to which the scores of a measurement instrument are an adequate reflection of the dimensionality of the construct to be measured
Hypotheses testing
The degree to which the scores of a measurement instrument are consistent with hypotheses based on the assumption that the measurement instrument validly measures the construct to be measured
Responsiveness
The ability of a measurement instrument to detect change over time in the construct to be measured
Table 2 Criteria for measurement properties (based on Prinsen et al.19, modified fromTerwee et al.20)
Measurement property
Rating*
Criteria
Internal consistency
+
At least limited evidence for unidimensionality or positive structural validity AND Cronbach’s alpha(s) ≥ 0.70 and ≤ 0.95
?
Not all information for ‘+’ reported OR conflicting evidence for unidimensionality or structural validity OR evidence for lack of unidimensionality or negative structural validity
–
Criteria for ‘+’ not met
Structural validity
+
Rasch/IRT:
At least limited evidence for unidimensionality or positive structural validity AND no evidence for violation of local independence: Rasch: standardized item-person fit residuals between -2.5 and 2.5; OR IRT: residual correlations among the items after controlling for the dominant factor < 0.20 OR Q3’s < 0.37 AND no evidence for violation of monotonicity: adequate looking graphs OR item scalability > 0.30 AND adequate model fit: Rasch: infit and outfit mean squares ≥ 0.5 and ≤ 1.5 OR Z-standardized values > -2 and < 2; OR IRT: G2 > 0.01;
Optional additional evidence:
Adequate targeting; Rasch: adequate person-item threshold distribution; IRT: adequate threshold range
No important DIF for relevant
subject characteristics (such as age, gender, education), Mc Faddens R2 < 0.02
?
IRT: model fit not reported
–
Criteria for (+) not met
Hypothesis testing
+
At least 75% of the results are in accordance with the hypotheses
?
No correlations with instrument(s) measuring related construct(s) AND no differences between relevant groups reported
–
Criteria for (+) not met
Responsiveness
+
At least 75% of the results are in accordance with the hypotheses
?
No correlations with changes in instrument(s) measuring related construct(s) AND no differences between changes in relevant groups reported
–
Criteria for (+) not met
DIF = differential item functioning, IRT = item response theory
*rating: + = positive rating, ? = indeterminate rating, – = negative rating
Table 3 Quality of evidence, based on Prinsen et al.19
Quality rating
Criteria
High
Consistent findings in multiple studies of at least good quality OR in one study of excellent quality AND a total sample size of ≥ 100 patients
Moderate
Conflicting findings in multiple studies of at least good quality OR consistent findings in multiple studies of at least fair quality OR one study of good quality AND a sample size of ≥ 50 patients
Low
Conflicting findings in multiple studies of at least fair quality OR one study of fair quality AND a total sample size of ≥ 30 patients
Very low
Only studies of poor quality OR a total sample size of < 30 patients
Unknown
No studies
Figure 1 Flowchart search and selection
Table 4 Characteristics of included study populations
Instrument
Study
N
Mean age
(year)
Gender
(% female)
Disease characteristics
Setting
Country
Language
Sampling
Response rate
(% missing)
PROMIS PF item bank
Hung et al.21
438
53
51
Back and Neck pain
University clinic
USA
English
Consecutive
0.4
PROMIS PF CAT
Brodke et al.22
1607
54
52
Back or leg pain
University clinic
USA
English
Consecutive
23
Papuga et al.23
283
55
48
Spine pain or disability
Academic Hospital
USA
English
Consecutive
0.4
Schalet et al.13
218
Median
Age group
55-59
56
Back or leg pain
University spine center
USA
English
Convenience?
21
PROMIS PF 10-item short form
Shahgholi et al.24
199
63
51
Lumbar radiculair pain
University spine center
USA
English
Consecutive
?
CAT = Computerized Adaptive Test, PF = Physical Function, PROMIS = Patient Reported Outcomes Measurement Information System
Table 5 Methodological quality of studies
Instrument
Study
IRT used
Reliability
Validity
Responsiveness
Construct validity
Internal consistency
Structural validity
Hypothesis testing
Responsiveness
PROMIS PF item bank
Hung et al.21
Yes
Excellent
Excellent
PROMIS PF CAT
Brodke et al.22
Yes
Excellent
Excellent
Excellent
Papuga et al.23
No
Excellent
Schalet et al.13
No
Fair
PROMIS PF 10-item short form
Shahgholi et al.24
No
Fair
Fair
CAT = Computerized Adaptive Test, PF = Physical Function, PROMIS = Patient Reported Outcomes Measurement Information System
Table 6 Measurement properties of individual studies
Instrument
Study
Reliability
Construct validity
Responsiveness
Floor and ceiling effects
Construct validity
Internal consistency
Structural validity
Hypothesis testing
PROMIS PF item bank
Hung et al.21
Item reliability 1.00
Person reliability 0.99
Unidimensionality: 2.9% unexplained variance
Floor effect 0.2%
Ceiling effect 1.7%
PROMIS PF CAT
Brodke et al.22
Item reliability 0.99
Person reliability 0.95
Unidimensionality: 2.6% unexplained variance
Correlation SF-36 PFD: 0.81
Correlation ODI: – 0.81
Floor effect 3.86%
Ceiling effect 0.81%
Papuga et al.23
Correlation ODI: 0.76 – 0.85 (Back pain)
Correlation NDI: 0.83 – 0.87 (Neck pain)
Schalet et al.13
Compared
to “general health anchor” but no correlation calculated
PROMIS PF 10-item short form
Shahgholi et al.24
Correlation RMDI: 0.7
Correlation EQ-5D: 0.5-0.6
Correlation NRS pain: 0.35 – 0.5
Correlation with change in RMDI
Baseline: 0.5
3 months: 0.55
6 months: < 0.1
Correlation with change in EQ-5D
Baseline: 0.45
3 month: 0.5
6 month: – 0.1
Correlation with change in NRS pain
Baseline: 0.51
3 months: 0.52
6 months: 0.35
CAT: Computerized Adaptive Test, EQ-5D: European Quality of Life scale 5D questionnaire, NDI = Neck Disability Index, NRS pain = Numerical Rating Scale for pain, ODI = Oswestry Disability Index, PF: Physical Function, PROMIS: Patient Reported Outcomes Measurement Information System, RMDI: Roland Morris Disability Index, SF-36 PFD = Short Form-36 Physical Function Domain,
Table 7 Summary of ratings for methodological quality and measurement properties
Instrument
Study
Reliability
Validity
Responsiveness
Construct validity
Internal consistency
Structural validity
Hypothesis testing
M
Q
M
Q
M
Q
M
Q
PROMIS PF item bank
Hung et al.21
excellent
+
excellent
+
PROMIS PF CAT
Brodke et al.22
excellent
+
excellent
+
excellent
+
Papuga et al.23
excellent
+
Schalet et al.13
fair
?
PROMIS PF 10-item short form
Shahgholi et al.24
fair
?
fair
?
M = methodological quality of the study: “excellent”, “good”, “fair” and “poor”. Q = Quality criteria for measurement property; + = positive rating, ? = indeterminate rating, – = negative rating
7. References
1. March L, Smith EU, Hoy DG, et al. Burden of disability due to musculoskeletal (MSK) disorders. Best Pract Res Clin Rheumatol 2014; 28(3): 353-66.
2. Vos T, Flaxman AD, Naghavi M, et al. Years lived with disability (YLDs) for 1160 sequelae of 289 diseases and injuries 1990-2010: a systematic analysis for the Global Burden of Disease Study 2010. Lancet 2012; 380(9859): 2163-96.
3. Chou R, Deyo R, Friedly J, et al. Nonpharmacologic Therapies for Low Back Pain: A Systematic Review for an American College of Physicians Clinical Practice Guideline. Annals of internal medicine 2017.
4. Snyder CF, Aaronson NK, Choucair AK, et al. Implementing patient-reported outcomes assessment in clinical practice: a review of the options and considerations. Qual Life Res 2012; 21(8): 1305-14.
5. Grotle M, Brox JI, Vollestad NK. Functional status and disability questionnaires: what do they assess? A systematic review of back-specific outcome questionnaires. Spine (Phila Pa 1976) 2005; 30(1): 130-40.
6. Schellingerhout JM, Verhagen AP, Heymans MW, Koes BW, de Vet HC, Terwee CB. Measurement properties of disease-specific questionnaires in patients with neck pain: a systematic review. Qual Life Res 2012; 21(4): 659-70.
7. Cella D, Yount S, Rothrock N, et al. The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years. Medical care 2007; 45(5): S3-11.
8. Cella D, Riley W, Stone A, et al. The Patient-Reported Outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005-2008. J Clin Epidemiol 2010; 63(11): 1179-94.
9. Riley WT, Rothrock N, Bruce B, et al. Patient-reported outcomes measurement information system (PROMIS) domain names and definitions revisions: further evaluation of content validity in IRT-derived item banks. Quality of life research : an international journal of quality of life aspects of treatment, care and rehabilitation 2010; 19(9): 1311-21.
10. Reeve BB, Hays RD, Bjorner JB, et al. Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Medical care 2007; 45(5 Suppl 1): S22-31.
11. Rose M, Bjorner JB, Becker J, Fries JF, Ware JE. Evaluation of a preliminary physical function item bank supported the expected advantages of the Patient-Reported Outcomes Measurement Information System (PROMIS). Journal of Clinical Epidemiology 2008; 61(1): 17-33.
12. Rose M, Bjorner JB, Gandek B, Bruce B, Fries JF, Ware Jr JE. The PROMIS Physical Function item bank was calibrated to a standardized metric and shown to improve measurement efficiency. Journal of Clinical Epidemiology 2014; 67(5): 516-26.
13. Schalet BD, Hays RD, Jensen SE, Beaumont JL, Fries JF, Cella D. Validity of PROMIS physical function measured in diverse clinical samples. J Clin Epidemiol 2016; 73: 112-8.
14. Moher D, Liberati A, Tetzlaff J, Altman DG, Group P. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. J Clin Epidemiol 2009; 62(10): 1006-12.
15. Terwee CB, Jansma EP, Riphagen, II, de Vet HC. Development of a methodological PubMed search filter for finding studies on measurement properties of measurement instruments. Qual Life Res 2009; 18(8): 1115-23.
16. Terwee CB, Mokkink LB, Knol DL, Ostelo RW, Bouter LM, de Vet HC. Rating the methodological quality in systematic reviews of studies on measurement properties: a scoring system for the COSMIN checklist. Qual Life Res 2012; 21(4): 651-7.
17. Mokkink LB, Terwee CB, Patrick DL, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol 2010; 63(7): 737-45.
18. Mokkink LB, Terwee CB, Knol DL, et al. Protocol of the COSMIN study: COnsensus-based Standards for the selection of health Measurement INstruments. BMC Med Res Methodol 2006; 6: 2.
19. Prinsen CA, Vohra S, Rose MR, et al. How to select outcome measurement instruments for outcomes included in a “Core Outcome Set” – a practical guideline. Trials 2016; 17(1): 449.
20. Terwee CB, Bot SD, de Boer MR, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol 2007; 60(1): 34-42.
21. Hung M, Hon SD, Franklin JD, et al. Psychometric properties of the PROMIS physical function item bank in patients with spinal disorders. Spine (Phila Pa 1976) 2014; 39(2): 158-63.
22. Brodke DS, Goz V, Voss MW, Lawrence BD, Spiker WR, Man H. PROMIS(R) PF CAT Outperforms the ODI and SF-36 Physical Function Domain in Spine Patients. Spine (Phila Pa 1976) 2016.
23. Papuga MO, Mesfin A, Molinari R, Rubery PT. Correlation of PROMIS Physical Function and Pain CAT Instruments With Oswestry Disabi
lity Index and Neck Disability Index in Spine Patients. Spine (Phila Pa 1976) 2016; 41(14): 1153-9.
24. Shahgholi L, Yost KJ, Carter RE, et al. Correlation of the Patient Reported Outcomes Measurement Information System with legacy outcomes measures in assessment
of response to lumbar transforaminal epidural steroid injections. AJNR American journal of neuroradiology 2015; 36(3): 594-9.
Appendix 1 Search strategy
PubMed search on 240217
Search terms
Results
# 7
#5 NOT #6
602
# 6
Exclusion filter**
3,547,347
# 5
3# AND 4#
620
# 4
Sensitive search filter*
7,974,931
# 3
#1 OR #2
729
# 2
Patient reported outcomes measurement information system[Tiab]
384
# 1
Promis[Tiab]
679
*Sensitive search filter
(instrumentation[sh] OR methods[sh] OR Validation Studies[pt] OR Comparative Study[pt] OR “psychometrics”[MeSH] OR psychometr*[tiab] OR clinimetr*[tw] OR clinometr*[tw] OR “outcome assessment (health care)”[MeSH] OR outcome assessment[tiab] OR outcome measure*[tw] OR “observer variation”[MeSH] OR observer variation[tiab] OR “Health Status Indicators”[Mesh] OR “reproducibility of results”[MeSH] OR reproducib*[tiab] OR “discriminant analysis”[MeSH] OR reliab*[tiab] OR unreliab*[tiab] OR valid*[tiab] OR coefficient[tiab] OR homogeneity[tiab] OR homogeneous[tiab] OR “internal consistency”[tiab] OR (cronbach*[tiab] AND (alpha[tiab] OR alphas[tiab])) OR (item[tiab] AND (correlation*[tiab] OR selection*[tiab] OR reduction*[tiab])) OR agreement[tiab] OR precision[tiab] OR imprecision[tiab] OR “precise values”[tiab] OR test–retest[tiab] OR (test[tiab] AND retest[tiab]) OR (reliab*[tiab] AND (test[tiab] OR retest[tiab])) OR stability[tiab] OR interrater[tiab] OR inter-rater[tiab] OR intrarater[tiab] OR intra-rater[tiab] OR intertester[tiab] OR inter-tester[tiab] OR intratester[tiab] OR intra-tester[tiab] OR interobserver[tiab] OR inter-observer[tiab] OR intraobserver[tiab] OR intra-observer[tiab] OR intertechnician[tiab] OR inter-technician[tiab] OR intratechnician[tiab] OR intra-technician[tiab] OR interexaminer[tiab] OR inter-examiner[tiab] OR intraexaminer[tiab] OR intra-examiner[tiab] OR interassay[tiab] OR inter-assay[tiab] OR intraassay[tiab] OR intra-assay[tiab] OR interindividual[tiab] OR inter-individual[tiab] OR intraindividual[tiab] OR intra-individual[tiab] OR interparticipant[tiab] OR inter-participant[tiab] OR intraparticipant[tiab] OR intra-participant[tiab] OR kappa[tiab] OR kappa’s[tiab] OR kappas[tiab] OR repeatab*[tiab] OR ((replicab*[tiab] OR repeated[tiab]) AND (measure[tiab] OR measures[tiab] OR findings[tiab] OR result[tiab] OR results[tiab] OR test[tiab] OR tests[tiab])) OR generaliza*[tiab] OR generalisa*[tiab] OR concordance[tiab] OR (intraclass[tiab] AND correlation*[tiab]) OR discriminative[tiab] OR “known group”[tiab] OR factor analysis[tiab] OR factor analyses[tiab] OR dimension*[tiab] OR subscale*[tiab] OR (multitrait[tiab] AND scaling[tiab] AND (analysis[tiab] OR analyses[tiab])) OR item discriminant[tiab] OR interscale correlation*[tiab] OR error[tiab] OR errors[tiab] OR “individual variability”[tiab] OR (variability[tiab] AND (analysis[tiab] OR values[tiab])) OR (uncertainty[tiab] AND (measurement[tiab] OR measuring[tiab])) OR “standard error of measurement”[tiab] OR sensitiv*[tiab] OR responsive*[tiab] OR ((minimal[tiab] OR minimally[tiab] OR clinical[tiab] OR clinically[tiab]) AND (important[tiab] OR significant[tiab] OR detectable[tiab]) AND (change[tiab] OR difference[tiab])) OR (small*[tiab] AND (real[tiab] OR detectable[tiab]) AND (change[tiab] OR difference[tiab])) OR meaningful change[tiab] OR “ceiling effect”[tiab] OR “floor effect”[tiab] OR “Item response model”[tiab] OR IRT[tiab] OR Rasch[tiab] OR “Differential item functioning”[tiab] OR DIF[tiab] OR “computer adaptive testing”[tiab] OR “item bank”[tiab] OR “cross-cultural equivalence”[tiab])
**Exclusion filter
(“addresses”[Publication Type] OR “biography”[Publication Type] OR “case reports”[Publication Type] OR “comment”[Publication Type] OR “directory”[Publication Type] OR “editorial”[Publication Type] OR “festschrift”[Publication Type] OR “interview”[Publication Type] OR “lectures”[Publication Type] OR “legal cases”[Publication Type] OR “legislation”[Publication Type] OR “letter”[Publication Type] OR “news”[Publication Type] OR “newspaper article”[Publication Type] OR “patient education handout”[Publication Type] OR “popular works”[Publication Type] OR “congresses”[Publication Type] OR “consensus development conference”[Publication Type] OR “consensus development conference, nih”[Publication Type] OR “practice guideline”[Publication Type]) NOT (“animals”[MeSH Terms] NOT “humans”[MeSH Terms])
Embase search on 240217
Search terms
Results
#5
#4 AND (‘article’/it OR ‘article in press’/it)
169
#4
#3 AND [embase]/lim NOT [medline]/lim
708
#3
#1 AND #2
1,029
#2
Search filter*
4,725,310
#1
promis:ab,ti OR ‘patient reported outcomes measurement information system’:ab,ti
1,358
*Search filter
‘intermethod comparison’/exp OR ‘data collection method’/exp OR ‘validation study’/exp OR ‘feasibility study’/exp OR ‘pilot study’/exp OR ‘psychometry’/exp OR ‘reproducibility’/exp OR reproducib*:ab,ti OR ‘audit’:ab,ti OR psychometr*:ab,ti OR clinimetr*:ab,ti OR clinometr*:ab,ti OR ‘observer variation’/exp OR ‘observer variation’:ab,ti OR ‘discriminant analysis’/exp OR ‘validity’/exp OR reliab*:ab,ti OR valid*:ab,ti OR ‘coefficient’:ab,ti OR ‘internal consistency’:ab,ti OR (cronbach*:ab,ti AND (‘alpha’:ab,ti OR ‘alphas’:ab,ti)) OR ‘item correlation’:ab,ti OR ‘item correlations’:ab,ti OR ‘item selection’:ab,ti OR ‘item selections’:ab,ti OR ‘item reduction’:ab,ti OR ‘item reductions’:ab,ti OR ‘agreement’:ab,ti OR ‘precision’:ab,ti OR ‘imprecision’:ab,ti OR ‘precise values’:ab,ti OR ‘test-retest’:ab,ti OR (‘test’:ab,ti AND ‘retest’:ab,ti) OR (reliab*:ab,ti AND (‘test’:ab,ti OR ‘retest’:ab,ti)) OR ‘stability’:ab,ti OR ‘interrater’:ab,ti OR ‘inter-rater’:ab,ti OR ‘intrarater’:ab,ti OR ‘intra-rater’:ab,ti OR ‘intertester’:ab,ti OR ‘inter-tester’:ab,ti OR ‘intratester’:ab,ti OR ‘intra- tester’:ab,ti OR ‘interobeserver’:ab,ti OR ‘inter-observer’:ab,ti OR ‘intraobserver’:ab,ti OR ‘intra- observer’:ab,ti OR ‘intertechnician’:ab,ti OR ‘inter-technician’:ab,ti OR ‘intratechnician’:ab,ti OR ‘intra- technician’:ab,ti OR ‘interexaminer’:ab,ti OR ‘inter-examiner’:ab,ti OR ‘intraexaminer’:ab,ti OR ‘intra- examiner’:ab,ti OR ‘interassay’:ab,ti OR ‘inter-assay’:ab,ti OR ‘intraassay’:ab,ti OR ‘intra-assay’:ab,ti OR ‘interindividual’:ab,ti OR ‘inter-individual’:ab,ti OR ‘intraindividual’:ab,ti OR ‘intra-individual’:ab,ti OR ‘interparticipant’:ab,ti OR ‘inter-participant’:ab,ti OR ‘intraparticipant’:ab,ti OR ‘intra- participant’:ab,ti OR ‘kappa’:ab,ti OR ‘kappas’:ab,ti OR ‘coefficient of variation’:ab,ti OR repeatab*:ab,ti OR (replicab*:ab,ti OR ‘repeated’:ab,ti AND (‘measure’:ab,ti OR ‘measures’:ab,ti OR ‘findings’:ab,ti OR ‘result’:ab,ti OR ‘results’:ab,ti OR ‘test’:ab,ti OR ‘tests’:ab,ti)) OR generaliza*:ab,t
i OR generalisa*:ab,ti OR ‘concordance’:ab,ti OR (‘intraclass’:ab,ti AND correlation*:ab,ti) OR ‘discriminative’:ab,ti OR ‘known group’:ab,ti OR ‘factor analysis’:ab,ti O
R ‘factor analyses’:ab,ti OR ‘factor structure’:ab,ti OR ‘factor structures’:ab,ti OR ‘dimensionality’:ab,ti OR subscale*:ab,ti OR ‘multitrait scaling analysis’:ab,ti OR ‘multitrait scaling analyses’:ab,ti OR ‘item discriminant’:ab,ti OR ‘interscale correlation’:ab,ti OR ‘interscale correlations’:ab,ti OR (‘error’:ab,ti OR ‘errors’:ab,ti AND (measure*:ab,ti OR correlat*:ab,ti OR evaluat*:ab,ti OR ‘accuracy’:ab,ti OR ‘accurate’:ab,ti OR ‘precision’:ab,ti OR ‘mean’:ab,ti)) OR ‘individual variability’:ab,ti OR ‘interval variability’:ab,ti OR ‘rate variability’:ab,ti OR ‘variability analysis’:ab,ti OR (‘uncertainty’:ab,ti AND (‘measurement’:ab,ti OR ‘measuring’:ab,ti)) OR ‘standard error of measurement’:ab,ti OR sensitiv*:ab,ti OR responsive*:ab,ti OR (‘limit’:ab,ti AND ‘detection’:ab,ti) OR ‘minimal detectable concentration’:ab,ti OR interpretab*:ab,ti OR (small*:ab,ti AND (‘real’:ab,ti OR ‘detectable’:ab,ti) AND (‘change’:ab,ti OR ‘difference’:ab,ti)) OR ‘meaningful change’:ab,ti OR ‘minimal important change’:ab,ti OR ‘minimal important difference’:ab,ti OR ‘minimally important change’:ab,ti OR ‘minimally important difference’:ab,ti OR ‘minimal detectable change’:ab,ti OR ‘minimal detectable difference’:ab,ti OR ‘minimally detectable change’:ab,ti OR ‘minimally detectable difference’:ab,ti OR ‘minimal real change’:ab,ti OR ‘minimal real difference’:ab,ti OR ‘minimally real change’:ab,ti OR ‘minimally real difference’:ab,ti OR ‘ceiling effect’:ab,ti OR ‘floor effect’:ab,ti OR ‘item response model’:ab,ti OR ‘irt’:ab,ti OR ‘rasch’:ab,ti OR ‘differential item functioning’:ab,ti OR ‘dif’:ab,ti OR ‘computer adaptive testing’:ab,ti OR ‘item bank’:ab,ti OR ‘cross-cultural equivalence’:ab,ti
CINAHL search on 24022017
Search terms
Results
S5
S1 AND S4
159
S4
S2 OR S3
191
S3
TI patient reported outcomes measurement information system OR AB patient reported outcomes measurement information system
122
S2
TI promis OR AB promis
163
S1
Search filter
260,915
Search filter:
(MH “Psychometrics”) or ( TI psychometr* or AB psychometr* ) or ( TI clinimetr* or AB clinimetr* ) or ( TI clinometr* OR AB clinometr* ) or (MH “Outcome Assessment”) or ( TI outcome assessment or AB outcome assessment ) or ( TI outcome measure* or AB outcome measure* ) or (MH “Health Status Indicators”) or (MH “Reproducibility of Results”) or (MH “Discriminant Analysis”) or ( ( TI reproducib* or AB reproducib* ) or ( TI reliab* or AB reliab* ) or ( TI unreliab* or AB unreliab* ) ) or ( ( TI valid* or AB valid* ) or ( TI coefficient or AB coefficient ) or ( TI homogeneity or AB homogeneity ) ) or ( TI homogeneous or AB homogeneous ) or ( TI “coefficient of variation” or AB “coefficient of variation” ) or ( TI “internal consistency” or AB “internal consistency” ) or (MH “Internal Consistency+”) or (MH “Reliability+”) or (MH “Measurement Error+”) or (MH “Content Validity+”) or “hypothesis testing” or “structural validity” or “cross-cultural validity” or (MH “Criterion-Related Validity+”) or “responsiveness” or “interpretability” or ( TI reliab* or AB reliab* ) and ( (TI test or AB test) OR (TI retest or AB retest) ) or ( TI stability or AB stability ) or ( TI interrater or AB interrater ) or ( TI inter-rater or AB inter-rater ) or ( TI intrarater or AB intrarater ) or ( TI intra-rater or AB intrarater ) or ( TI intertester or AB intertester) or (TI inter-tester or AB inter-tester) or ( TI intratester or AB intratester) or ( TI intra-tester or AB intra-tester) or ( TI interobserver or AB interobserver) or (TI inter-observer or AB inter-observer ) or ( TI intraobserver or AB intraobserver) or ( TI intra-observer or AB intra-observer) or ( TI intertechnician or AB intertechnician) or (TI inter-technician or AB inter-technician) or ( TI intratechnician or AB intratechnician ) or ( TI intra-technician or AB intra-technician ) or ( TI interexaminer or AB interexaminer ) or (TI inter-examiner or AB inter-examiner) or (TI intraexaminer or AB
intraexaminer ) OR (TI intra-examiner or AB intra-examiner ) or (TI intra-examiner or AB intraexaminer ) or (TI interassay or AB interassay ) or ( TI inter-assay or AB inter-assay ) or ( TI intraassay or AB intraassay) or ( TI intra-assay or AB intra-assay ) or (TI interindividual or AB interindividual) or (TI inter-individual or AB inter-individual) OR (TI intraindividual or AB intraindividual) or (TI intra-individual or AB intra-individual) or (TI interparticipant or AB interparticipant) or (TI inter-participant or AB inter-participant ) or (TI intraparticipant or AB intraparticipant) or (TI intra-participant or AB intra-participant ) or (TI kappa or AB kappa) or (TI kappa’s or AB kappa’s ) or (TI kappas or AB kappas) or (TI repeatab* or AB repeatab*) or ( TI responsive* or AB responsive* ) or ( TI interpretab* or AB interpretab* )
Pedro search on 240217
Search terms
Results
S2
Abstract & Title; “patient reported outcomes measurement system”
2
S1
Abstract & Title; “promis”
4