top of page


Calculating Power by Bootstrap, with an Application to Cluster-randomized Trials

Kleinman and Huang, eGEMs 2017


Calderwood, Kleinman, Huang, et al., Medical Care 2017

Background: Surgical site infection (SSI) rates are publicly reported as quality metrics and increasingly used to determine financial reimbursement.

Objective: To evaluate the volume-outcome relationship as well as the year-to-year stability of performance rankings following coronary artery bypass graft (CABG) surgery and hip arthroplasty.

Research Design: We performed a retrospective cohort study of Medicare beneficiaries who underwent CABG surgery or hip arthroplasty at US hospitals from 2005 to 2011, with outcomes analyzed through March 2012. Nationally validated claims-based surveillance methods were used to assess for SSI within 90 days of surgery. The relationship between procedure volume and SSI rate was assessed using logistic regression and generalized additive modeling. Year-to-year stability of SSI rates was evaluated using logistic regression to assess hospitals’ movement in and out of performance rankings linked to financial penalties.

Results: Case-mix adjusted SSI risk based on claims was highest in hospitals performing <50 CABG/year and <200 hip arthroplasty/year compared with hospitals performing ≥200 procedures/year. At that same time, hospitals in the worst quartile in a given year based on claims had a low probability of remaining in that quartile the following year. This probability increased with volume, and when using 2 years’ experience, but the highest probabilities were only 0.59 for CABG (95% confidence interval, 0.52–0.66) and 0.48 for hip arthroplasty (95% confidence interval, 0.42–0.55).

Conclusions: Aggregate SSI risk is highest in hospitals with low annual procedure volumes, yet these hospitals are currently excluded from quality reporting. Even for higher volume hospitals, year-to-year random variation makes past experience an unreliable estimator of current performance.


Vaz, Kleinman, Kawai et al., ICHE 2015

Background: Policymakers may wish to align healthcare payment and quality of care while minimizing unintended consequences, particularly for safety net hospitals.

Objective: To determine whether the 2008 Centers for Medicare and Medicaid Services Hospital-Acquired Conditions policy had a differential impact on targeted healthcare-associated infection rates in safety net compared with non–safety net hospitals. design. Interrupted time-series design.

Setting and participants: Nonfederal acute care hospitals that reported central line–associated bloodstream infection and ventilator associated pneumonia rates to the Centers for Disease Control and Prevention’s National Health Safety Network from July 1, 2007, through December 31, 2013.

Results: We did not observe changes in the slope of targeted infection rates in the postpolicy period compared with the prepolicy period for either safety net (postpolicy vs prepolicy ratio, 0.96 [95% CI, 0.84–1.09]) or non–safety net (0.99 [0.90–1.10]) hospitals. Controlling for prepolicy secular trends, we did not detect differences in an immediate change at the time of the policy between safety net and non–safety net hospitals (P for 2-way interaction, .87).

Conclusions: The Centers for Medicare and Medicaid Services Hospital-Acquired Conditions policy did not have an impact, either positive or negative, on already declining rates of central line–associated bloodstream infection in safety net or non–safety net hospitals. Continued evaluations of the broad impact of payment policies on safety net hospitals will remain important as the use of financial incentives and penalties continues to expand in the United States.


Lauer, Kleinman, Reich, PLos ONE 2015

The frequency of cluster-randomized trials (CRTs) in peer-reviewed literature has increased exponentially over the past two decades. CRTs are a valuable tool for studying interventions that cannot be effectively implemented or randomized at the individual level. However, some aspects of the design and analysis of data from CRTs are more complex than those for individually randomized controlled trials. One of the key components to designing a successful CRT is calculating the proper sample size (i.e. number of clusters) needed to attain an acceptable level of statistical power. In order to do this, a researcher must make assumptions about the value of several variables, including a fixed mean cluster size. In practice, cluster size can often vary dramatically. Few studies account for the effect of cluster size variation when assessing the statistical power for a given trial. We conducted a simulation study to investigate how the statistical power of CRTs changes with variable cluster sizes. In general, we observed that increases in cluster size variability lead to a decrease in power.


Li, Kleinman, and Gillman, JDOHaD 2014

We implemented six confounding adjustment methods: (1) covariate-adjusted regression, (2) propensity score (PS) regression, (3) PS stratification, (4) PS matching with two calipers, (5) inverse probability weighting and (6) doubly robust estimation to examine the associations between the body mass index (BMI) z-score at 3 years and two separate dichotomous exposure measures: exclusive breastfeeding v. formula only (n=437) and cesarean section v. vaginal delivery (n=1236). Data were drawn from a prospective pre-birth cohort study, Project Viva. The goal is to demonstrate the necessity and usefulness, and approaches for multiple confounding adjustment methods to analyze observational data. Unadjusted (univariate) and covariate-adjusted linear regression associations of breastfeeding with BMI z-score were -0.33 (95% CI -0.53, -0.13) and -0.24 (-0.46, -0.02), respectively. The other approaches resulted in smaller n (204-276) because of poor overlap of covariates, but CIs were of similar width except for inverse probability weighting (75% wider) and PS matching with a wider caliper (76% wider). Point estimates ranged widely, however, from -0.01 to -0.38. For cesarean section, because of better covariate overlap, the covariate-adjusted regression estimate (0.20) was remarkably robust to all adjustment methods, and the widths of the 95% CIs differed less than in the breastfeeding example. Choice of covariate adjustment method can matter. Lack of overlap in covariate structure between exposed and unexposed participants in observational studies can lead to erroneous covariate-adjusted estimates and confidence intervals. We recommend inspecting covariate overlap and using multiple confounding adjustment methods. Similar results bring reassurance. Contradictory results suggest issues with either the data or the analytic method.


Lee, Kleinman, Soumerai, et al., NEJM 2012

Background: In October 2008, the Centers for Medicare and Medicaid Services (CMS) discontinued additional payments for certain hospital-acquired conditions that were deemed preventable. The effect of this policy on rates of health care–associated infections is unknown.

Methods: Using a quasi-experimental design with interrupted time series with comparison series, we examined changes in trends of two health care–associated infections that were targeted by the CMS policy (central catheter–associated bloodstream infections and catheter-associated urinary tract infections) as compared with an outcome that was not targeted by the policy (ventilator-associated pneumonia). Hospitals participating in the National Healthcare Safety Network and reporting data on at least one health care–associated infection before the onset of the policy were eligible to participate. Data from January 2006 through March 2011 were included. We used regression models to measure the effect of the policy on changes in infection rates, adjusting for baseline trends.

Results: A total of 398 hospitals or health systems contributed 14,817 to 28,339 hospital unit–months, depending on the type of infection. We observed decreasing secular trends for both targeted and nontargeted infections long before the policy was implemented. There were no significant changes in quarterly rates of central catheter–associated bloodstream infections (incidence-rate ratio in the postimplementation vs. preimplementation period, 1.00; P=0.97), catheter-associated urinary tract infections (incidence-rate ratio, 1.03; P=0.08), or ventilator-associated pneumonia (incidence-rate ratio, 0.99; P=0.52) after the policy implementation. Our findings did not differ for hospitals in states without mandatory reporting, nor did it differ according to the quartile of percentage of Medicare admissions or hospital size, type of ownership, or teaching status.

Conclusions: We found no evidence that the 2008 CMS policy to reduce payments for central catheter–associated bloodstream infections and catheter-associated urinary tract infections had any measurable effect on infection rates in U.S. hospitals. (Funded by the Agency for Healthcare Research and Quality.)


Calderwood, Kleinman, Soumarai, et al., ICHE 2014

Background: The Centers for Medicare and Medicaid Services (CMS) implemented a policy in October 2008 to eliminate additional Medicare payment for mediastinitis following coronary artery bypass graft (CABG) surgery.

Objective: To evaluate the impact of this policy on mediastinitis rates, using Medicare claims and National Healthcare Safety Network (NHSN) prospective surveillance data.

Methods: We used an interrupted time series design to compare mediastinitis rates before and after the policy, adjusted for secular trends. Billing rates came from Medicare inpatient claims following 638,761 CABG procedures in 1,234 US hospitals (January 2006-September 2010). Prospective surveillance rates came from 151 NHSN hospitals in 29 states performing 94,739 CABG procedures (January 2007-September 2010). Logistic regression mixed-effects models estimated trends for mediastinitis rates.

Results: We found a sudden drop in coding for index admission mediastinitis at the time of policy implementation (odds ratio, 0.36 [95% confidence interval (CI), 0.23-0.57]) and a decreasing trend in coding for index admission mediastinitis in the postintervention period compared with the preintervention period (ratio of slopes, 0.83 [95% CI, 0.74-0.95]). However, we saw no impact of the policy on infection rates as measured using NHSN data. Our results were not affected by changes in patient risk over time, heterogeneity in hospital demographics, or timing of hospital participation in NHSN.

Conclusions: The CMS policy of withholding additional Medicare payment for mediastinitis on the basis of claims-based evidence of infection was associated with changes in coding for infections but not with changes in actual infection rates during the first 2 years after policy implementation.


Regnault, Kleinman, Rifas-Shiman, et al., IJE 2014

Background: In children being taller is associated with higher blood pressure (BP), but few studies have divided height into its components: trunk and leg length. We examined the associations of total height, trunk length and leg length with systolic BP (SBP), diastolic BP (DBP) and pulse pressure (PP) at early childhood and mid-childhood visits, as well as change between the two visits.

Methods: We obtained five measures of SBP and DBP at the early childhood visit (N = 1153, follow-up rate = 54%) and at the mid-childhood visit (N = 1086, follow-up rate = 51%) respectively, in Project Viva, a US cohort study. We measured total height and sitting height (a measure of trunk length that includes head and neck) and calculated leg length as the difference between the two. Using mixed models, we adjusted the cross-sectional analyses for leg length when trunk length was the exposure of interest, and vice versa. We also adjusted for maternal race/ethnicity, child age, sex, overall adiposity and BP measurement conditions.

Results: At the mid-childhood visit, total height was positively associated with SBP [0.34 (0.24; 0.45) mmHg/cm] but not with DBP [0.07 (−0.003; 0.15)]. In models examining trunk and leg length separately, each was positively associated with SBP [0.72 (0.52; 0.92) and 0.33 (0.16; 0.49) respectively]. In a fully adjusted model with both leg and trunk length, only trunk length remained associated with BP. For a given leg length, a 1-cm increment in trunk length was associated with a 0.63-mmHg (0.42; 0.83) higher SBP and a 0.17-mmHg (0.02; 0.31) higher DBP. For a given trunk length, however, the associations of leg length with SBP [0.13 (−0.03; 0.30)] and with DBP [0.002 (−0.11; 0.12)] were null. These patterns were similar at the early childhood visit.

Conclusions: Children with greater trunk lengths have higher BPs, perhaps because of the additional pressure needed to overcome gravity to perfuse the brain.


Huang, Septimus, Kleinman, et al., NEJM 2013

Background: Both targeted decolonization and universal decolonization of patients in intensive care units (ICUs) are candidate strategies to prevent health care–associated infections, particularly those caused by methicillin-resistant Staphylococcus aureus (MRSA).

Methods: We conducted a pragmatic, cluster-randomized trial. Hospitals were randomly assigned to one of three strategies, with all adult ICUs in a given hospital assigned to the same strategy. Group 1 implemented MRSA screening and isolation; group 2, targeted decolonization (i.e., screening, isolation, and decolonization of MRSA carriers); and group 3, universal decolonization (i.e., no screening, and decolonization of all patients). Proportional-hazards models were used to assess differences in infection reductions across the study groups, with clustering according to hospital.

Results: A total of 43 hospitals (including 74 ICUs and 74,256 patients during the intervention period) underwent randomization. In the intervention period versus the baseline period, modeled hazard ratios for MRSA clinical isolates were 0.92 for screening and isolation (crude rate, 3.2 vs. 3.4 isolates per 1000 days), 0.75 for targeted decolonization (3.2 vs. 4.3 isolates per 1000 days), and 0.63 for universal decolonization (2.1 vs. 3.4 isolates per 1000 days) (P=0.01 for test of all groups being equal). In the intervention versus baseline periods, hazard ratios for bloodstream infection with any pathogen in the three groups were 0.99 (crude rate, 4.1 vs. 4.2 infections per 1000 days), 0.78 (3.7 vs. 4.8 infections per 1000 days), and 0.56 (3.6 vs. 6.1 infections per 1000 days), respectively (P<0.001 for test of all groups being equal). Universal decolonization resulted in a significantly greater reduction in the rate of all bloodstream infections than either targeted decolonization or screening and isolation. One bloodstream infection was prevented per 99 patients who underwent decolonization. The reductions in rates of MRSA bloodstream infection were similar to those of all bloodstream infections, but the difference was not significant. Adverse events, which occurred in 7 patients, were mild and related to chlorhexidine.

Conclusions: In routine ICU practice, universal decolonization was more effective than targeted decolonization or screening and isolation in reducing rates of MRSA clinical isolates and bloodstream infection from any pathogen. (Funded by the Agency for Healthcare Research and the Centers for Disease Control and Prevention; REDUCE MRSA number, NCT00980980.)

Power calculation is one of the most important and most neglected areas of planning for trails and other studies.  For cluster-randomized trials, designs quickly get so complex that it is difficult to find power calculation software that does what we need it to do.  In such cases, we should use Monte Carlo methods to estimate power and sample size.  Many statisticians are familiar with using simulation to estimate sample size.  In this paper we describe how to use bootstrap methods instead.

bottom of page