Abstract
Background: New approaches are needed to ensure that surgical trainees attain competence in a timely way. Traditional solutions have focused on the years spent in surgical training. We sought to examine the outcomes of graduates from 3-year versus 4-year medical schools for differences in surgeon performance based on multisource feedback data.
Methods: We used data from the College of Physicians and Surgeons of Alberta’s Physician Achievement Review program to determine curricular outcomes. Data for each surgeon included assessments from 25 patients, 8 medical colleagues and 8 non-physician coworkers (e.g., nurses), and a self-assessment. We used these data to compare 72 physicians from a 3-year school matched with graduates from 4-year schools. The instruments were assessed for evidence of validity and reliability. We compared the groups using 1-way analysis of covariance and multivariate analysis of covariance, with years since graduation as a covariate, and a Cohen d effect size calculation to assess the magnitude of the change.
Results: Data for 216 surgeons indicated that there was evidence for instrument validity and reliability. No significant differences were found based on the length of the undergraduate program for any of the questionnaires or factors within the questionnaires.
Conclusion: Reconsideration might be given to the time spent in medical school before surgical training if training in the specialty and career years are to be maximized. This assumes that students are able to make informed career decisions based on clerkship and other experiences in a 3-year setting.
Leaders and funders of medical education in both Canada and the United States are concerned about the length of time needed to graduate physicians from medical school and prepare them for competent independent medical and surgical practice.1,2 This is particularly problematic for surgery, which has a long period of training after medical school. Surgical trainees are increasingly expected to extend their core training through subspecialization,3,4 and there is concern that reduced work hours in training programs will compromise the attainment of surgical skills.5,6
The Royal College of Physicians and Surgeons of Canada (RCPSC)7 requires that attention be paid to core competencies in addition to surgical skills. Addressing these competencies has been challenging for some programs8 and requires creative approaches. These approaches have included simulation9 and assessment with the objective structured clinical examination,10 and a move away from time-based training to competency-based training.11 Time spent in residency is precious and must be used to its maximum advantage.12
Solutions to surgical competency have traditionally addressed the years spent in surgical training. Little attention has been paid to the time before surgical residency, namely medical school, and whether it can be shortened to maximize the time available for development of surgical skills. During the 1970s, almost one-quarter of the medical schools in the United States established 3-year medical school programs in response to federal legislation providing financial incentives.2 These programs were later discontinued, despite the lack of evidence that students were at a disadvantage.2 By contrast, Canada established 3-year programs at both the University of Calgary and McMaster University. The University of Calgary curriculum is 131 weeks compared with the University of Alberta curriculum of 144 weeks. At the University of Calgary, the first 2 years combine basic and clinical science in a clinical presentation curriculum, and the final year is a traditional clinical clerkship. At the University of Alberta, the first 2 years focus on organ systems, the third year provides rotations through the standard 6 specialties and the fourth-year rotations provide a deeper understanding of subspecialties.12 Other Canadian 4-year medical schools would have curricula similar to that of the University of Alberta. This 30-year natural experiment with graduates of 3-year medical programs appears to have produced physicians who are competent,1,12,13 although the objective data come from performance data for family physician–general practitioners12,13 and not surgeons.
The purpose of this study was to compare the performance of practising surgeons in Alberta who graduated from the University of Calgary (a 3-year school) with matched samples from the University of Alberta (a 4-year school) and graduates from other 4-year Canadian medical schools. We used data from a regulatory authority that assesses a broad range of competencies related to clinical skill, communication, professionalism and management skill. In Alberta, each surgeon must participate in the Physician Achievement Review (PAR) program, a province-wide multisource evaluation performed every 5 years. This evaluation is required by the College of Physicians and Surgeons of Alberta (CPSA), the regulatory licensing authority, and consists of questionnaire data from 25 patients, 8 medical colleagues and 8 non-physician coworkers (e.g., nurses, physical therapists), as well as self-assessment data. All of the surgical subspecial-ties are assessed using 1 set of surveys. Participation in PAR every 5 years along with participation in the maintenance of certification program of the RCPSC is required for continued licensure. When the questionnaires were developed, almost 10 years ago, they showed evidence for validity and reliability.14,15 Since the program became mandatory, 645 surgeons have participated in the program at least once, 383 of whom have participated on 2 occasions.
This study addressed the following questions.
What is the current evidence for the reliability and validity of the PAR questionnaires used for the surgical specialties based on data from the surgeons’ most recent assessment?
What are the factors within each of the questionnaires used for the surgical specialists?
Are there differences in mean aggregate scores between schools (University of Calgary, University of Alberta and other Canadian 4-year medical schools) based on the whole assessment and on the factors within questionnaires?
Methods
Pivotal Research Inc., the administrator of the CPSA’s PAR program, under the direction of the CPSA, provided an anonymous data set for this study. The data set consisted of a sample of 550 Canadian surgeons who completed the PAR program. Each University of Calgary graduate in the data set (n = 72) was matched to graduates from the same year (or close to the same year) who graduated from the University of Alberta and other Canadian medical schools. Where there was more than 1 possible match, the match was made to someone in the same subspecialty (e.g., ophthalmology). Graduates of McMaster University were excluded as they have a 3-year curriculum. Graduates of international medical schools were excluded because their undergraduate education is likely to be more variable than that of graduates of schools accredited by the Liaison Committee on Medical Education.
Data for each physician included assessments from 25 patients, 8 medical colleagues and 8 nonphysician coworkers, and a self-assessment. Whereas the questionnaires were not based on CanMEDS competencies, the items reflect many of the CanMEDS roles, namely Medical Expert, Scholar, Professional, Communicator, Collaborator, Manager and Health Advocate. They were items that could be observed and captured the public and professional expectations of a surgeon. Copies of the questionnaires are available online (www.par-program.org). The assessments use a 5-point Likert scale. All questionnaire forms provide the respondent with an “unable to assess” option. The data set included a limited amount of sociodemographic information (year and school of graduation, urban/regional/rural location, specialty). The data were from surgeons’ most recent PAR experience (collected between 2002 and 2009).
The data were analyzed in a number of ways. First, because the instruments were developed and assessed almost 10 years ago, a current psychometric assessment was deemed appropriate to ensure the instruments and their items continued to provide evidence of validity and reliability. This is important for this study and for the future use of the instruments. Other regulatory jurisdictions have adopted or are considering using the instruments; it is important that they be assessed regularly. Descriptive data calculations were done for each of the items on each of the questionnaires. These analyses enabled an examination of the range and mean (and standard deviation [SD]) for each item. As well, items that are not functioning well (i.e., have high percentage of “unable to assess”) were identified. A reliability analysis was completed by calculating the Cronbach α for each survey and each factor to determine the internal consistency reliability of the instruments and scales. A generalizability study (Ep2) for each survey was conducted to establish the reliability of the data for each surgeon who was assessed. This assessment indicates whether the combination of items and raters achieves an appropriate level (generally Ep2 = 0.70). These data informed question 1. A confirmatory factor analysis for each of the questionnaires determined the factor structures for each instrument (research question 2). Last, a 1-way analysis of covariance (ANCOVA) was used to evaluate differences between schools for each instrument using the aggregate mean questionnaire score as the dependent variable and years since graduation as the covariate (to control for potential confounding of effect by years since graduation). A multivariate analysis of covariance (MANCOVA) was used to evaluate the differences between medical schools on the aggregate mean factor scores for each questionnaire, with years since graduation as the covariate. The significance level was calculated using multivariate F from the ANCOVA and MANCOVA with a significance level set at 0.05. An effect size calculation (Cohen d) was used to determine the magnitude of differences (research question 3).
Results
Participant matching produced a data set of 216 surgeons (72 × 3). As shown in Table 1, the groups were about equal in terms of response rates on questionnaires. There were more men in the University of Alberta cohort (86.1%) than in the cohorts from the University of Calgary (80.6%) and other medical schools (73.6%). Similarly, whereas the numbers of surgeons practising in urban centres were similar, there were more University of Alberta graduates practising in regional centres than from the other 2 groups.
The medical colleague data indicated that the mean score for all items was greater than 4.0 out of 5. There were a few items that a higher percentage of colleagues were unable to assess. These were mostly items that were difficult to observe easily (e.g., medical record quality, professional development involvement, stress management, areas beyond scope, critical evaluation of medical literature and contribution to quality improvement). The self-assessment questionnaire, written in the first person, had items that were identical to the medical colleague assessment. The self-assessment scores the surgeons provided were lower on all items, with means ranging from 3.58 for the item on contribution to quality improvement activities to 4.30 for respecting the rights of patients. The Cronbach α scores for both the medical colleague and self-assessments were high, at greater than 0.9. The generalizability coefficient study resulted in a coefficient of Ep2 = 0.61. A 4-factor solution emerged from the analysis, which explained almost 75% of the variance. The items aligned into 4 broad areas (factors): communication and professionalism, medical expert, scholar and manager. These data are provided in Table 2.
The nonphysician coworkers gave scores between 4.36 and 4.66 on all items. Three items had high “unable to assess” rates and were related to written information about prescriptions and hospital orders as well as accessibility for communication about mutual patients. The internal consistency reliability was 0.955 for the whole scale. The generalizability coefficient was Ep2 = 0.70. A 2-factor solution (oral communication and professionalism, and written communication) emerged, which explained 72% of the variance. These data are provided in Table 3.
The means on the patient data ranged from 4.34 to 4.78. Patients were able to assess most items. The internal consistency reliability for the overall questionnaire was greater than 0.98. The generalizability study resulted in a coefficient of Ep2 = 0.81. A 4-factor solution emerged from the analysis, accounting for 77% of the variance. The 4 factors were communication, manager, follow-up and management. These data are provided in Table 4.
The comparison of the aggregate mean scores and mean factor scores showed that there were no differences by school for any of the assessments or factors within the questionnaires (Table 5).
Discussion
This study affirms that the multisource feedback instruments developed for the PAR program to assess surgeons are still viable after nearly 10 years of use. There are relatively few items that medical colleagues, nonphysician coworkers and patients are unable to assess. The items that have the most “unable to assess” responses are ones that are difficult to observe and likely should be revised or deleted. The instruments and their scales are reliable, as shown by the Cronbach α analysis. The generalizability study indicates that all 3 instruments reached stability, but the medical colleague instrument was less reliable than the nonphysician coworker and patient instruments.
The factor analysis indicates that the items continue to correlate in ways that reflect CanMEDS roles. Surgeons participating in the program receive feedback about their oral and written communication from medical colleagues, nonphysician coworkers and patients. They also receive feedback to varying degrees about the Medical Expert and Scholar roles from medical colleagues, manager skills from medical colleagues and patients, and professionalism from medical colleagues and nonphysician coworkers.
The instruments have been adopted for use in Nova Scotia and are being considered for use in other Canadian jurisdictions. In Canada, regulatory authorities are increasing their expectations of physicians. Multisource feedback is a relatively inexpensive assessment method designed to provide physicians and surgeons with formative feedback to guide development in areas that would not be available to them through institutional surgical audits. In the United Kingdom, the National Health Service is working toward the inclusion of multisource feedback, using questionnaire data from colleagues and patients as part of its revalidation procedures in all specialties, including surgery.16,17 An instrument such as the one developed in Alberta may be suitable for an international environment. It may also be a helpful way of enhancing behaviours related to professionalism, communication and collaboration.
We did not find any differences in surgeon performance by school of graduation either in the assessment as a whole or in factors within the questionnaires. This suggests an equivalency of performance for graduates of the University of Calgary and those from 4-year medical schools. Detecting differences is difficult, as the clerkship experience, residency program, postfellowship training, practice norms, patient expectations and changing science will all have an impact on differences over time.
Limitations
There are limitations to our study. Our study size of 72 in each group was small. We were limited by the numbers of University of Calgary graduates who trained in surgery and entered practice in Alberta. Those 72 surgeons provided the base for matching with University of Alberta graduates and those of other Canadian medical schools. There were insufficient numbers of McMaster graduates (the other 3-year medical school) practising as surgeons in Alberta to add them to the comparison. The ideal of randomization of the effect of curriculum on performance would be a preferable design to our covariate analysis, which used years since graduation as the covariate, but this design is not easily attainable using data from a naturalistic environment. Nonetheless, we tried to match the 3 groups of physicians and subsequently to covary the potential impact of years of practice on the dependent variables. The physicians in the study had been in practice for a mean of 19 years. We must also note that curricula of medical schools evolve, and making comparisons can be difficult.
Conclusion
The present study is unique. Once physicians get beyond certification and board examinations, few data exist to make long-term comparisons. These data provide a com-parative quantitative analysis showing that graduating competent physicians in a shorter time is possible, and the shorter time does not appear to have long-term implications for surgical practice based on multisource feedback data for surgeons. Response rates from each source for each group were comparable and were high because participation in the PAR program is mandatory. The study shows that if the duration of medical school programs could be adjusted, graduates could train in surgery and be ready for practice 1 year earlier without any detectable differences in competency on our assessment tool. However, this assumes that students are able to make informed career decisions sufficiently early to plan clerkship electives and effectively assess their career choices.
Acknowledgements
The authors thank the College of Physicians and Surgeons of Alberta, particularly John Swiniarski, Trevor Theman and Bryan Ward (deceased) who enabled this study to be conducted, as well as Steven Dennis from Pivotal Research Inc.
Footnotes
Competing interests: None declared.
Contributors: J. Lockyer, C. Violato and B. Wright designed the study. J. Lockyer acquired the data, which was analyzed by J. Lockyer, C. Violato, H. Fidler and R. Chan. J. Lockyer, C. Violato and H. Fidler wrote the article, which was reviewed by J. Lockyer, C. Violato, B. Wright and R. Chan. All authors approved the article that was submitted for publication.
- Accepted March 24, 2011.