Power analysis is a key component for planning prospective studies such as clinical trials. However, some journals in biomedical and psychosocial sciences ask for power analysis for data already collected and analysed before accepting manuscripts for publication. In this report, post hoc power analysis for retrospective studies is examined and the informativeness of understanding the power for detecting significant effects of the results analysed, using the same data on which the power analysis is based, is scrutinised. Monte Carlo simulation is used to investigate the performance of posthoc power analysis.

Power analysis plays a key role in designing and planning prospective studies. For clinical trials in biomedical and psychosocial research, power analysis provides critical information about sample sizes needed to detect statistically significant and clinically meaningful differences between different treatment groups. Power analysis also provides critical information for evaluating cost–benefit ratios so that studies can be conducted with minimal resources without compromising on scientific integrity and rigour.

What is interesting is that some journals also ask for power analysis for the study data that were already analysed and reported in a manuscript before considering its publication. Although the exact purposes of such requests are not clearly stated, it seems that this often happens when manuscripts include some non-significant results. As such post hoc power analysis is conceptually flawed, concerns have been raised over the years.

As most research studies are conducted based on a random sample from a study population of interest, results from power analysis become meaningless, as the random component in the study disappears once data are collected. Power analysis shows the probability, or likelihood, for a statistical test or model to detect, say, hypothesised differences between two populations, such as the t statistic for comparing, say, mean blood pressure level between two groups in a sample of interest in a prospective study. If a sample is selected, outcomes are no longer random and power analysis becomes meaningless for this particular study sample.

Nevertheless, some continue to argue that such power analyses may help provide some indication whether a hypothesis still may be true.

In this article, we focus on comparing the means between two groups on a continuous outcome, and use Monte Carlo simulation to investigate the performance of post hoc power analysis and to see if such power estimates are informative in terms of indicating power to detect statistically significant differences already observed. We begin our discussion with a brief overview of the concept and analytic evaluation of power analysis within the context of two independent samples, or groups.

We have considered to use two independent samples and let _{
ik
} denote a continuous outcome of interest from subject _{
k
}, _{
ik
} follows a normal distribution of population mean _{
k
} and common population variance ^{2}, denoted as

The most common hypothesis in this setting is whether the population means are equal to each other. In statistical lingo, we state the hypothesis as follows:

where _{0} and _{a} are known as the null and alternative hypotheses, respectively. The above is known as a two-sided hypothesis, as no direction of effect is specified in the alternative hypothesis _{1} is hypothesised to be larger than _{2} under _{a}. As two-sided alternatives are the most popular in clinical research, we only consider two-sided alternatives in this paper unless stated otherwise. Note also that when testing the hypothesis in equation (1) with data as in data analysis, _{0} without any knowledge about _{a}. For power analysis, the difference

The hypothesis in equation (1) is generally tested using the two sample t-test. Let _{0} is true. Again, because _{0} is rejected when _{a} in fact is true.

If _{0}, therefore committing type I error

where _{α}
_{/2} is the upper

The probability in

Note that like the effect size, the difference between the sample means in

For power analysis, we want to determine the probability to reject the null _{0} in favour of the alternative _{a}. Given the type I error _{1} and _{2}, and _{0} and _{a}, we can calculate power for reject the null _{0}:

Although similar in apperance, _{1} and _{2}, the population mean difference

In practice, we often set power at some prespecified levels and then perform power analysis to determine the minimum sample size to achieve the desired levels of power. We can use the power function _{1} and _{2} to achieve, say, 0.8 power, we can solve for _{1} and _{2} in the following equation:

Note also that power functions can also be evaluated by replacing the mean difference

In the preceding section, we discussed power analysis for comparing two population means for prospective studies. To evaluate power, we must specify the mean difference between the two population means, regardless of whether we want to estimate power for the given sample size or vice versa. Such a difference is study specific, which, although may be suggested by similar studies, should not be exclusively determined by one single study. This is because unlike the difference between population means

The difference between the population and sample-based parameters underscores the problem with post hoc power analysis. Not only is power analysis performed based on the sample-based mean difference, power estimates are also applied back to the same data to indicate power. Post hoc power analysis identifies population-level parameters with sample-specific statistics and makes no conceptual sense. Analytically, such analysis can yield quite different power estimates that are difficult and can be misleading.

To see this, consider again the problem to test the hypothesis in

To help see the difference, we express the two power function side-by-side as follows:

The prospective power function is determined by the population mean difference

In this section, we use Monte Carlo simulation to compare the prospective and post hoc power functions. In all cases, we set a two-sided alpha at

We again assume a normal distribution _{
ik
}, with _{
k
} denoting the (population) mean of group ^{2} the common (population) variance. We set the population-level parameters as follows:

For convenience, we assume a common sample size for both groups, that is, _{1}=_{2}=

Given all these parameters, we can readily evaluate the prospective power function in

With Monte Carlo simulation, we can readily examine the difference between the two power functions. By repeatedly simulating samples from the population distributions, we can look at the variability of the post hoc power function and see how it performs with respect to predicting true power.

Shown in

Histograms of post hoc power, along with true power, based on 1000 Monte Carlo sample sizes with the mean difference: (A)

Shown in

Histograms of post hoc power, along with true power, based on 1000 Monte Carlo sample sizes with the mean difference: (A)

Power analysis is an indispensable component of planning clinical research studies. However, when used to indicate power for outcomes already observed, it is not only conceptually flawed but also analytically misleading. Our simulation results show that such power analyses do not indicate true power for detecting statistical significance, since post hoc power estimates are generally variable in the range of practical interest and can be very different from the true power.

In this report, we focus on the relatively simple statistical model for comparing two population means of continuous outcomes. The same considerations and conclusions also apply to non-continuous outcomes and more complex models such as regression. In general, post hoc power analyses do not provide sensible results.

All authors participated in the discussion of the statistical issues and worked together to develop this report. AR, RR, SR and XT discussed the problems with the justification of post hoc power analysis and interpretation of such power analysis results within the contexts of their studies and how to approach and clarify the issues in clinical research. YZ, RH and XT worked together to develop the formulas for the power functions and the R codes, and performed simulation studies to understand the performance of post hoc power analysis. XT drafted the manuscript and all authors helped revise the paper.

This study was supported by National Institutes of Health, Navy Bureau of Medicine and Surgery Grant UL1TR001442 of CTSA N1240.

None declared.

Not required.

Commissioned; internally peer reviewed.

No additional data available.