Presented orally at the 52nd Annual Meeting of the Society for Surgery of the Alimentary Tract (SSAT), Chicago, IL, 10 May 2011.
Dr. Jennifer F. Tseng (Boston, MA): I applaud you for the monumental effort to get at the causes of our most dreaded event after pancreatectomy. I have two questions and a comment. First, you used self-reported data, which is subject to recall and selection bias. How did you control for this? Second, those that lived and died were drawn from different populations and thus are not directly comparable. Did or could you perform this analysis in a true population, and were the results the same? Finally, I would like to clarify the use of risk scores such as our SOAR score (which you refer to as Charlson-based in your manuscript draft) and ACS-NSQIP. Risk calculators are designed to identify individuals in a preoperative or prospective fashion that fall into higher-risk groups. Note that the highest-risk group in our SOAR score—customized at a high-volume center with 2% perioperative mortality—would have an average mortality on the order of 5%, compared to 0.5% in the lowest group. These scores are not designed to post hoc confirm known mortality, but to identify patients at greater risk. A better test of a score would be: how often did these known mortalities fall into the highest risk? Those identified high-risk patients might have benefited from an individualized strategy preoperative, decreasing their risk for perioperative death.
Dr. Charles M. Vollmer, Jr.: Thank you, Jennifer, for your discerning review. No doubt the method we employed is subject to the biases you point out. The mechanism for data acquisition simply could not accommodate a fully objective approach. I believe these are honorable, trustworthy surgeons who agreed to participate in this sometimes uncomfortable “soul-searching” process for the ultimate benefit of their, and others’, patients. One element of balance was the fact that I served as a final objective arbiter in determining the root cause of death for each patient after scrutinizing the facts presented.
For your second question, I’d like to clarify that the comparison group of 1,177 patients is not drawn from an entirely different population. This did not represent some sort of obtuse historical control cohort. I am sorry if I gave you that impression. Rather, it was the full experience during that time frame of 2 of the 15 institutions involved that had complete annotation of data. This represents roughly a 10% sample (albeit not completely random) of both the overall cases (11,500), deaths (23 out of 218), and surgeons involved (4 out of 36). As you can imagine, it was practically unfeasible to accrue the depth of necessary data from so many index cases spread across databases of various sophistication. However, given that deaths were so infrequent in any given practice, the surgeons were able to concentrate on the data for these rare events in a manageable fashion. You are certainly correct in that the “best” process would be a complete head-to-head comparison of the mortalities to all cases performed, or at least a “better” process would be a truly random sample of the index cases. We feel that the comparison group we used was the best option we had to discern the characteristics between mortalities and nonmortalities. Finally, we are comfortable that the demographics and outcomes from the comparison group we used are comparable to current benchmarks in the literature for pancreatic resection surgery.
In response to your important final remark … I continue to struggle conceptually with the fact that these prediction scores are forecasting the chance of death to be only in the single digits for what is in fact a “pure” cohort. What I mean is, each of these patients actually experienced the outcome of death. You are entirely correct—we show data in a post hoc fashion as these systems have been applied to broad populations. These scores are not necessarily intended as audit tools (with possibly the exception of POSSUM), but rather for pre-event decision making. So, to your point that perhaps we will maximize their value through relying on risk stratification, we do have some data to share. For instance, for all comers, ASA did not show a significant difference between the mortality and nonmortality cohorts. However, if you drill down to the ASA IV class, there is indeed a fourfold increased risk for death (odds ratio, 3.95). Similarly, we segregated the Charlson score by risk groups. Mortalities were far more likely to be in either the high-risk (8.7% vs. 0.4%) or intermediate-risk (73.9% vs. 44.2%) strata. Furthermore, the odds ratio for death among the patients deemed high risk was 22.38. So, using these prediction tools, we can conclude that, by and large, the patients in the mortality cohort were of higher preoperative risk. You make reference to applying an individualized preoperative strategy for such patients. Perhaps, given the findings of this study, one such strategy (that of declining an operation) might be given more serious consideration in patients with dubious constitution.