NeuroPowerTools
  • NeuroPower
    • NeuroPower
    • Tutorial
    • Methods
    • FAQ
  • NeuroDesign
    • NeuroDesign
    • Tutorial
    • Methods
    • Beyond the GUI
    • FAQ

Neuropower Tutorial

  • Getting Started
  • Data input
  • Extract Peaks
  • Fit Model
  • Power Analysis
  • Power table

Getting Started

Pilot data

This toolbox is based on an unthresholded statistical map from pilot data. There's different sources for pilot data. Here are some options.

  • Collecting a pilot dataset. Yes, collecting pilot data is expensive. Yet it is your best option for a power analysis.
  • Open data. If you don't have the resources to collect pilot data, there's a lot of people sharing their data. NeuroVault is a good source for statistical maps! Try to find an experiment with a comparable study design as yours and you have a good proxy for pilot data.
  • Previously collected data. You or your colleagues might have data lying around with an experimental setup comparable to your new study. You can use these data. And while you're at it, why not share them online?

Responsible research practices

With too small sample sizes, true effects can be missed, the magnitude of statistically significant effects is exaggerated, and significant findings are not likely to replicate. As such, a power analysis is a crucial step in a powerful and reproducible neuroimaging study. However, power analyses can be misused for questionable research practices. Here's what you shouldn't use our toolbox for.

No Data Peeking

Imagine you collect 10 subjects and you perform an analysis. You see two possibilities:

  1. You find the significant effects you were hypothesising:
    Woohoo! This means I can stop my experiment, write my paper and publish my findings.
  2. Not everything you expected is significant, but there is a trend:
    You decide to add a few more subjects and then look at the data again. To decide how many more subjects you'll need, you use our toolbox.

THAT IS NOT A GOOD USE OF THIS TOOLBOX. This is a practice referred to as data peeking, or conditional stopping. You cannot let your decision to stop or continue depend on the statistical inference. This will inflate your type I error rate. Tal Yarkoni wrote a detailed explanation about why data peeking is so bad.

How to prevent this?

  • Don't perform statistical inference on your pilot data and don't re-use your data.
  • There are ways to correct for the conditional stopping. However, while this practice protects your false positive rate, it will decrease your overall power and therefore the reproducibility of your study. We do not recommend this approach.

Do not re-use your pilot data

Even if you don't perform statistical inference on the pilot data, the pilot data should be independent from the final data. Why? Re-using data will still increase your overall type I error rate. You can find more details in Jeanette Mumford's guide for power calculations. We are working on a method that will allow you to re-use your data (without inference still!). We'll present it on OHBM2016.


Conditional power

It is important to note here that peakwise as well as clusterwise analysis use a screening threshold (also known as the clusterforming threshold or the excursion threshold). This means that only voxels above this threshold are considered for significance testing. SPM uses a default screening threshold of p<0.001 (only voxels with p-values smaller than 0.001 are retained for further analysis). FSL uses a default screening threshold of Z>2.3 (only voxels with z-values higher than 2.3 are used). It is important to know that our measure of power is conditional, which means that our measure of power computes the average chance of detecting an active peak for all peaks above the screening threshold. All activity below the screening threshold is ignored. As a consequence, the power estimate cannot be reported independent of the screening threshold.



Next: Data Input

Details about your data

In the first screen, under the tab , you specify the following:

  • Data Location: In this tutorial, we use data from NeuroVault as an example. The statistical map is a contrast between reading plain and reversed text. The data is from a paper by Jimura, Cazalis, Stover and Poldrack (2014). Copy the link from the button in NeuroVault and paste it in the field in the form.
    If you want to use your own dataset, you can browse and upload a dataset.

  • Mask Location: Usually you have already masked your data during your analysis. However, if you have a more strict mask, or if you're performing a Region-of-Interest analysis, you can add a mask here.

  • Design specifications: You are asked a number of basic parameters. If you're using your own data, you probably know these parameters. If you're using data from a published paper, these parameters should be given in the paper.
  • Previous: Getting Started Next: Peak Table
    Clicking the tab invokes the extraction of the local maxima in the statistical map. This step, while not of particular interest to the user, is crucial for the power calculations. Depending on the size of the dataset, this step might take a while. Once all peaks are extracted, a table is shown with the coordinates, the values and the p-values for each peak.

    Previous: Peak Table Next: Fit Model
    The tab will show you the result of the fit of the model. What we are aiming for is finding the alternative distribution of peaks. How to do this?

    1. Estimate the mean and the standard deviation of the alternative distribution

    Background: We assume that the total distribution of peaks is a mixture of two different distributions:

    • Null distribution (green line): following Random Field Theory (Worsley, 2007), we assume that the distribution follows an exponential distribution with mean u+1/u for screening threshold u.
    • Alternative distribution (red line): we assume the peaks follow a normal distribution with parameters μ and σ. However, because of the screening threshold u, the normal distribution is truncated at u. This slightly changes the form but not the parameters of the distribution.
    There are three unknown parameters in this mixture distribution: μ, σ and π1. μ and σ are the parameters of the alternative distribution. π1 refers to the weights between the null and the alternative distribution. We estimate π1 in a separate step (see step 2), and μ and σ are estimated using maximum likelihood.

    In this example: The light blue histogram shows the observed distribution of the peaks. We see that a lot of the peaks are close to 2.3 (the screening threshold) as is expected for the null peaks. But you can also see the alternative distribution: a bell shaped distribution with the mean around 4.1. The estimated null distribution is shown with the green line, the estimated alternative distribution is given by the red line. The total distribution is the blue line. δ1 refers to the standardised mean of the alternative distribution and can be interpreted in units of cohen's D. In other words, the estimated effect is large in this dataset.

    How to evaluate the fit? Our model strives to find a good fit of the (estimated) blue line with the total observed distribution (light blue). This is a decent match and as such we assume a good estimation of the alternative distribution.


    2. Estimate π1

    Background: We want to estimate the weights of the null and the alternative distribution. This translates to the percentage of the peaks that are located in brain regions that show task-related activity. To estimate π1, we use the method of Pound and Morris (2003). We look at the distribution of uncorrected p-values (light blue histogram on figure). We assume that this distribution is a mixture of two different distributions:

    • Null distribution (green line): The p-values of peaks that are in non-task-activated brain regions follow a uniform distribution by definition of p-values.
    • Alternative distribution (red line): We model the p-values of activated peaks as a beta-distribution. We assume that contrary to the null distribution, the alternative distribution will be a lot higher closer to 0. This because we assume that the p-values from active peaks are small. We use the beta-distribution because of its flexibility to model any shape of distributions with values between 0 and 1.
    In this mixture model, there are two unknown parameters: π1 and a shape parameter for the beta-distribution. With maximum likelihood, we choose those parameters that can describe best the distribution.

    In this example: We can clearly see that a lot of p-values have very small values. This reflects the presence of activation in the data. But how much exactly?
    Our model estimates that the proportion of the distribution that is flat is 48%. This is presented by a green line in the figure. The height of the green line is 0.48.
    The other 52% of the distribution is assumed to be from peaks in task-activated brain regions. The total distribution as fitted by our model is shown by the red line.

    How to evaluate the fit? Our model again strives to a good fit of the (estimated) red line with the total observed distribution (light blue). It is normal that there are some deviations. For example in this case the green line does not fit the distribution well between 0.8 and 1. However, in general, the red line fits well the data and therefore we assume a good estimation of π1.

    Previous: Extract Peaks Next: Power Analysis

    Once the model is fitted and you don't see large errors in the model fit, it is time for the power analysis. Clicking the tab will show you power curves. When you hover over the lines, you can see exact estimates of power for certain sample sizes.


    Should you want a very precise estimate for the sample size for a given value of power, then you can fill out the form. It is important that you choose the multiple comparison procedure (MCP) that is planned in your final study. Both SPM and FSL use Random Field Theory (familywise error rate control), although SPM also give Benjamini-Hochberg (false discovery rate control) results in the default output.

    In this example, when we aim for 80% power with Benjamini-Hochberg error rate control, we'll need a sample size of 19 subjects.


    If you have a specific number of subjects and you want to know the power that comes with it, you can specify the sample size field in the form. In this example, a sample size of 35 subjects with Random Field Theory error rate control results in 96% power.

    If the resulting sample size is larger than 50 (the standard limit of the x-axis), the x-axis will adjust to this limit


    Previous: Fit Model Next: Power Table

    Finally, this table is the data that is used for the power curves. You see a overview of all possible sample sizes for the different MCP's and which power these have.



    Previous: Power Analysis