Abstract
Introduction: The Western Canada Waiting List Project (WCWL), a federally funded partnership of 19 organizations, was created to develop tools for managing waiting lists. The WCWL panel on hip and knee replacement surgery was 1 of 5 panels constituted under this project.
Methods: The panel developed and tested a collection of standardized clinical criteria for setting priorities among patients awaiting hip and knee replacement. The criteria were applied to 405 patients in 4 provinces. Regression analysis was used to determine the set of criteria weights that collectively best predicted clinicians’ overall urgency ratings. Inter-rater and test-retest reliability was assessed from 6 videotaped patient interviews, scored by orthopedic surgeons, related professionals and general practitioners.
Results: The priority criteria accounted for over two-thirds of the observed variance in overall urgency ratings (adjusted R2 = 0.676). The panel modified the criteria and weights based on the empirical findings and on clinical judgement. The reliability of the priority criteria for the hip and knee replacement tool was among the strongest of the 5 instruments developed in the WCWL project.
Conclusions: The panel considered the criteria easy to use and reasonably reflective of expert surgical judgement regarding clinical urgency for hip and knee replacement. Further development and testing of the tool appears warranted.
Waiting lists for health care services are a constant source of public distress and political anxiety. Equally troublesome is the prevalent impression that waiting lists may not be fair.1,2 A recent report concluded that the management of waiting lists across Canada is, in general, “chaotic,” “non-standardized, capriciously organized, poorly monitored, and…in grave need of retooling.”3 As such, the authors conclude, it is “impossible to…rationally manage the patients on those lists.”3 And impossible, therefore, to guarantee fairness.
Most broad classification systems currently used to categorize patients according to urgency for hip and knee replacement are highly subjective and inadequate to assess and compare urgency and case-mix of patients on waiting lists.
Priority criteria
In response to the need for better management of waiting lists, an increasing number of clinicians and health authorities are adopting point-count measures for assessing patients’ relative clinical urgency or priority. 4 Similar point-count measures are used in many medical settings to assess severity of illness and risk of adverse events (e.g., Apgar score, APACHE score). Such measures function statistically as additive or linear models.5
Similarly, priority criteria estimate severity of illness as an indicator of urgency, although additional considerations are often included, such as whether patients’ illnesses are interfering with their daily living. The principal functions of priority criteria are: (1) to guide decisions about the relative urgency and order of surgery and (2) to develop case-mix descriptions. These descriptions can be used to assess and compare waiting lists across regions and over time.
A recent review of initial experience with priority criteria in several countries endorsed this approach, citing benefits such as greater transparency, an equitable system and provision of service led by clinical need and in the control of clinicians.6
In initial work on the clinical validity of priority criteria in New Zealand, regression analysis was used to generate weights for sets of priority criteria, based on a comparison with overall clinical judgement.7,8 However, the number of patients included in most analyses was relatively small. Recently, the WCWL used a multifaceted approach to develop priority criteria for general surgery, using a larger patient sample to assess clinical validity.9 The WCWL also assessed reliability of the general surgery priority criteria, (i.e., the extent to which raters arrive at the same or similar ratings using the criteria when evaluating the same or similar patients). Otherwise, reliability work with priority criteria appears to have been limited. One study found good agreement between the priority scores assigned by general practitioners and specialists to patients with hip or knee arthritis or cataracts.10
The Western Canada Waiting List Project
The WCWL was established with a grant from Health Canada’s Health Transition Fund to address some of the problems in waiting list management identified in the report of McDonald and colleagues.3 In particular, the project focused on developing, testing and refining clinician-scored priority criteria capable of assessing and comparing the relative urgency of surgery for patients on waiting lists.
The WCWL is a collaborative undertaking by 7 regional health authorities, 4 medical associations, 4 provincial ministries of health and 4 health research centres. Clinical panels have been constituted to address each of 5 specialty areas. This article describes the experience of the hip and knee replacement panel.
Materials and methods
The hip and knee replacement panel comprised 7 academic and community orthopedic surgeons, 3 family physicians, a geriatrician, a physical therapist and a rheumatologist drawn from the 4 western Canadian provinces. The panel was co-chaired by the authors and was in place from October 1999 to June 2000, with work extended beyond this period under the guidance of the WCWL Project Steering Committee.
The literature on major joint replacement and associated outcomes was reviewed. At the initial meeting, the panel elected to adopt the New Zealand major joint replacement criteria as a starting point. These were incorporated into a priority criteria form used by 9 panel members and 8 designated colleagues to score a series of consecutive patients in their practices (Table 1). Data collection was initiated in December 1999.
Participating clinicians assigned each patient to the appropriate level on each criterion (e.g., mild pain, moderate functional limitations) and rated the overall urgency for each patient on a 10-cm visual analogue scale (VAS). The latter rating served as the dependent variable in standard linear regression analyses, which were used to determine the statistically optimal set of weights on each criterion to best predict (or to correlate with) overall urgency. The regression analyses were constrained to retain all predictor variables (criteria) regardless of cross-correlations among criteria.* Analyses were carried out for the total sample of patients, as well as for a subgroup with primary hip or knee replacement.
After interim analysis of 156 completed forms in January 2000, panellists made several changes to the form (Table 1, Jan. 29, 2000, revision). Data collection continued until May 2000 using the revised form.
Inter-rater reliability was assessed from 6 patient interviews that were conducted by one of the authors (G.A.) or a colleague, and videotaped for later showing to panellists and other clinicians. The interviews incorporated physical examination and radiographic evidence. These cases were scored independently by panel members and their colleagues (including 14 orthopedic surgeons) in June 2000. Inter-rater reliability was assessed according to the intra-class correlation coefficient (ICC). In addition, panel members discussed their scoring of 2 of the videotaped interviews in a qualitative assessment of the criteria.
Initial validity and reliability results were reviewed in June 2000, resulting in minor modifications to the criteria and regression weights to improve clinical utility and face validity (Table 1, June 26, 2000, revision). Scores were apportioned among items so that the total maximum achievable score, summing across the most severe response category for each criterion, was 100 points.
Further reliability data were collected in December 2000 from 14 raters (8 orthopedic surgeons). Test-retest comparisons were made with the June reliability ratings, based on responses from 11 raters (6 orthopedic surgeons). In addition, the videotaped cases were rated independently by 11 general practitioners in order to gain insight into the utility of the priority criteria tool for referring clinicians.
Results
Panellists agreed that the form was easy to use and that the criteria provided an accurate reflection of how surgeons view the relative urgency of their patients for hip or knee replacement.
Table 1 provides a summary of criteria and score development. Based on the January 2000 interim analysis of 156 completed forms, the panellists deleted 1 item (multiple joint disease), added “potential for progression of disease” and combined 2 questions (range of motion and abnormal orthopedic findings). The number of response categories was reduced from 6 to 4 for questions on pain.†
The new item, “potential for progression of disease,” was added primarily to cover patients whose previous joint replacement had failed and needed surgical revision. Such patients score relatively low on pain and functional disability yet are considered to be relatively urgent due to worsening joint disease if the prosthesis is not revised in a timely manner. Participating clinicians considered these patients to be systematically “under-prioritized” by the initial criteria. Subsequent experience showed that adding the “potential for progression” criterion succeeded in “levelling the playing field” for all patients.
A total of 444 priority criteria forms were submitted, of which 405 contained complete and usable data. Optimal weights were inferred from regression analysis as indicated in Table 1. The R2 was 0.681 (adjusted R2 = 0.676), indicating that the priority criteria accounted for approximately two-thirds of the statistical variance in clinicians’ global urgency ratings. The most powerful variables in predicting urgency rating were: (1) the combination of 2 pain items (at motion and at rest), (2) potential for progression, (3) ability to work or look after dependants and (4) functional limitations other than walking. In subgroup analyses, the adjusted R2 was 0.706 for primary hip or knee replacement. The sub-sample size was insufficient for regression analysis on patients undergoing revisions.
Univariate Pearson r correlations are presented in Table 2. The highest correlation among predictors occurred between the “abnormal findings” and “potential for progression” variables (0.75) as well as for the “functional limitations” and social role (“ability to work, give care to dependants, live independently”) criteria (0.58). The relatively high correlation between the latter set of variables is what would be expected with consistent scoring.
Reliability results for initial ratings of the videotaped patient interviews are depicted in the first column of Table 3. In the June assessment, the VAS urgency ratings for the 6 patients had an excellent ICC value of 0.82 (0.85 for surgeons). One of the 7 criteria items had excellent reliability (ICC > 0.75); 5 had fair to good reliability; and 1 (potential for progression of disease) had poor reliability. Similar reliability was observed for the mixed group of clinicians and for the subgroup of orthopedic surgeons.
Based on the results of the pilot testing analysis and initial reliability testing, panellists made minor changes in the empirically derived weights and in the content of the instrument (Table 1). They also arrived at a series of recommendations concerning the further testing and use of the criteria.
Criteria scores should be compared to scores from other assessment tools (e.g., WOMAC [Western Ontario and MacMaster Universities Osteoarthritis Index]).
Priority criteria forms should also be tested with general practitioners.
A set of operational definitions and instructions should be prepared to accompany the criteria prior to implementation.
They also recommended that patients should be reassessed with the priority criteria at some point during long waiting periods.
Further empirical work was undertaken to assess the reliability of the revised tool. The inter-rater reliability findings from December 2000 were similar to those of June 2000, with “potential for progression of disease” continuing to have low reliability. Test-retest reliability was assessed, based on input from 11 raters (6 orthopedic surgeons), who used the priority criteria to score the same cases at 2 points in time. Relatively high intra-rater consistency in scoring was observed over the 5- to 6-month interval; 3 criteria had ICC values in the excellent range and 3 in the fair to good. The visual analogue rating of urgency had an excellent test-retest ICC value of 0.90 or greater.
Reliability was also studied with 11 general practitioners as raters (Table 3). ICC values were quite comparable to those for orthopedic surgeons as well as raters from related clinical fields. The only item that had lower reliability for general practitioners was item 3 (ability to walk without significant pain).
Discussion
Perhaps the most important finding emerging from this project is that orthopedic surgeons and other clinicians from the 4 western provinces in Canada accepted and endorsed the ability of clinical priority criteria to reflect global expert judgements of urgency. Based on discussions at panel meetings, participants considered the criteria to have good face validity and to be easy to use. As such, the experiences reported here add to the international literature concerning physicians’ acceptance of the validity and utility of clinical priority criteria. The R2 values from regression analyses obtained for the total sample and for the patients who underwent primary replacement (0.676 and 0.706, respectively) are well within the range observed in other WCWL criteria and in New Zealand criteria.
Like other WCWL panels, the hip and knee replacement panellists wished to incorporate within the purview of the criteria all patients awaiting hip or knee replacement surgery, including both primary replacements and revisions. Panellists’ initial experience with the criteria revealed that patients in need of revision did not score sufficient points to reflect surgeons’ judgements of urgency. Addition of a “potential for progression” criterion appeared to correct this situation.
The WCWL hip and knee replacement criteria are designed to be completed by health professionals, unlike other assessment forms used in this field, notably the WOMAC, which is designed to be completed by patients. WOMAC scores were collected on a subset of patients during the WCWL project, and further analyses incorporating this tool will be reported in a separate paper. Such a comparison was proposed by the panel, as noted above.
The reliability of the priority criteria for the hip and knee replacement tool was among the strongest of the 5 instruments developed in the WCWL project. This was observed for ratings by a mixed group of clinicians, by orthopedic surgeons and by general practitioners. The reliability results suggest that clinicians using the instrument can achieve good inter-rater agreement and good intra-rater stability in scoring over time. The creation and use of 6 videotaped interviews of actual patients provided an excellent source of standard material to assess reliability. No special effort was made to standardize the rating process, such as by providing examples of patients conforming to “mild pain,” or to provide specific definitions of the various levels within each criterion. As such, the observed results represent a “worst case” scenario, which can almost certainly be improved upon with practice and clarification of terms.
It has not yet been demonstrated in any definitive way that the weighted scores will actually rank patients in the appropriate order of priority, based on clinical urgency. Ideally, such demonstration would follow patients over time and compare health outcomes (e.g., reduction in pain) of patients who wait varying lengths of time. When such studies are performed, measures similar to the priority criteria described in this article would be suitable for capturing outcomes.
A number of operational challenges can be foreseen with the use of priority criteria for scheduling of surgery. For example, patients with relatively minor (but still significant) arthritis will always score lower than patients with more symptomatic, serious conditions. As new, high-scoring patients are seen, low-scoring patients will never reach the top of the list. This problem could be addressed by adding points to the scores of patients simply for time spent waiting. However, this could lead to a different problem, with patients having less severe conditions regularly “bumping” patients with more severe conditions. It was for this reason that all WCWL panels decided against incorporating time for waiting into the criteria.
Another issue, raised regularly during the project, concerned the possibility that patients and clinicians would “game the system” by virtue of knowing how the point system works. These concerns need to be addressed through careful monitoring, use of standard raters or other techniques. However, the current chaotic and unregulated system can in most areas be easily gamed, without the possibility for audit.
It is hoped that further development will lead to an instrument that can be widely used for prioritization and case-mix description of patients on waiting lists for hip and knee replacement. It is imperative that such an instrument be developed to permit assessment and accountability — and, ultimately, fairness — in the context of orthopedic waiting lists. Moreover, criteria such as those described in this article could potentially be used more broadly within orthopedics to include assessments of urgency of patients with conditions other than major joint arthritis. Such service-wide use of orthopedic criteria has been attempted in New Zealand, although the results of these efforts have not been published.
Acknowledgements
Members of the Steering Committee of the Western Canada Waiting List Project are as follows: Dr. Tom Noseworthy, Department of Public Health Sciences, University of Alberta, Edmonton (Chair); Dr. Morris L. Barer, Centre for Health Services and Policy Research, and Professor, Department of Health Care and Epidemiology, University of British Columbia, Vancouver; Dr. Charlyn Black, Manitoba Centre for Health Policy and Evaluation, and Department of Community Health Sciences, University of Manitoba, Winnipeg; Ms. Lauren Donnelly, Acute and Emergency Services Branch, Saskatchewan Health, Regina; Dr. Isra Levy, Health Programs, Canadian Medical Association, Ottawa; Mr. Steven Lewis, Access Consulting Ltd., Saskatoon; Dr. Sam Sheps, Department of Health Care and Epidemiology, University of British Columbia, Vancouver; Dr. Mark C. Taylor, Department of Surgery, University of Manitoba; Mr. Laurence Thompson, Health Services Utilization and Research Commission, Saskatoon; Mr. Darrell Thomson, Economics and Policy Analysis, British Columbia Medical Association, Vancouver; Ms. Barbara Young, Clinical Evaluation Services, Calgary Regional Health Authority, Calgary; Mr. John McGurran, Western Canada Waiting List Project and Department of Public Health Sciences, University of Toronto, Toronto.
The Western Canada Waiting List Project was supported by a financial contribution from the Health Transition Fund (Health Canada) as Project NA489. The views expressed herein do not necessarily represent the official policy of federal, provincial or territorial governments.
We are indebted to the 19 partner organizations for their ongoing support throughout the project: British Columbia Medical Association; Capital Health Region (Victoria); Vancouver/ Richmond Health Board; British Columbia Ministry of Health; University of British Columbia, Centre for Health Services and Policy Research; Alberta Medical Association; Capital Health Authority (Edmonton); Calgary Regional Health Authority; Alberta Health and Wellness; University of Alberta, Department of Public Health Sciences; Saskatchewan Medical Association; Regina Health District; Saskatoon District Health; Saskatchewan Health; Health Services Utilization and Research Commission; Winnipeg Regional Health Authority; Manitoba Health; Manitoba Centre for Health Policy and Evaluation; Canadian Medical Association.
We acknowledge the members of the panel who contributed to the development of the hip and knee replacement surgery priority criteria tool: Dr. Ted Findlay, Dr. Donald Garbuz, Dr. Robert Glasgow, Ms. Karin Greaves, Dr. David Hedden, Dr. Mary Hurlburt, Dr. Bill Johnston, Dr. Stewart McMillan, Dr. Jack Reilly, Dr. Anne Sclater, Dr. Kenneth Skeith and Dr. Lowell van Zuiden. We also thank colleagues of the panel members who participated in the pilot testing and reliability work in Winnipeg, Regina, Saskatoon, Calgary, Edmonton and Vancouver.
We would like to recognize the data collection and analytical contributions of Dr. N.G.N. Prasad, Dr. Barbara Conner-Spady, Ms. Elaine Dunn, Ms. Helen Roman-Smith and Ms. Anne-Marie Pedersen.
Footnotes
↵* In standard regression, predictor variables are often dropped if they are significantly correlated with other, more highly predictive variables. However, panellists wished to retain all criteria to ensure adequate face validity, even where significant correlations did exist among criteria. Thus, for example, it would probably be unacceptable from a clinical point of view to remove explicit consideration of the extent of pain from the questionnaire, even if this factor tended to correlate with (or to be “captured by”) scores on other items.
† Appropriate adjustments were made in the final regression analyses to accommodate these changes. For Items 1 and 2, the categories “Mild, slight or occasional” and “Mild–moderate” were combined into “Mild,” “Moderate” and “Moderate–severe” were combined into “Moderate.” Questions 5 and 6 were combined; the more severe score of the 2 questions was entered. “Potential for progression of disease” was added as a new item; data for cases scored before this change were imputed using a linear regression model based on information obtained on this item in subsequent cases.
Competing interests: None declared.
- Accepted January 8, 2003.