Neuropower Tutorial

Getting Started
Data input
Extract Peaks
Fit Model
Power Analysis
Power table

Getting Started

Pilot data

This toolbox is based on an unthresholded statistical map from pilot data. There's different sources for pilot data. Here are some options.

Collecting a pilot dataset. Yes, collecting pilot data is expensive. Yet it is your best option for a power analysis.
Open data. If you don't have the resources to collect pilot data, there's a lot of people sharing their data. NeuroVault is a good source for statistical maps! Try to find an experiment with a comparable study design as yours and you have a good proxy for pilot data.
Previously collected data. You or your colleagues might have data lying around with an experimental setup comparable to your new study. You can use these data. And while you're at it, why not share them online?

Responsible research practices

With too small sample sizes, true effects can be missed, the magnitude of statistically significant effects is exaggerated, and significant findings are not likely to replicate. As such, a power analysis is a crucial step in a powerful and reproducible neuroimaging study. However, power analyses can be misused for questionable research practices. Here's what you shouldn't use our toolbox for.

No Data Peeking

Imagine you collect 10 subjects and you perform an analysis. You see two possibilities:

You find the significant effects you were hypothesising:
Woohoo! This means I can stop my experiment, write my paper and publish my findings.
Not everything you expected is significant, but there is a trend:
You decide to add a few more subjects and then look at the data again. To decide how many more subjects you'll need, you use our toolbox.

THAT IS NOT A GOOD USE OF THIS TOOLBOX. This is a practice referred to as data peeking, or conditional stopping. You cannot let your decision to stop or continue depend on the statistical inference. This will inflate your type I error rate. Tal Yarkoni wrote a detailed explanation about why data peeking is so bad.

How to prevent this?

Don't perform statistical inference on your pilot data and don't re-use your data.
There are ways to correct for the conditional stopping. However, while this practice protects your false positive rate, it will decrease your overall power and therefore the reproducibility of your study. We do not recommend this approach.

Do not re-use your pilot data

Even if you don't perform statistical inference on the pilot data, the pilot data should be independent from the final data. Why? Re-using data will still increase your overall type I error rate. You can find more details in Jeanette Mumford's guide for power calculations. We are working on a method that will allow you to re-use your data (without inference still!). We'll present it on OHBM2016.

Conditional power

It is important to note here that peakwise as well as clusterwise analysis use a screening threshold (also known as the clusterforming threshold or the excursion threshold). This means that only voxels above this threshold are considered for significance testing. SPM uses a default screening threshold of p<0.001 (only voxels with p-values smaller than 0.001 are retained for further analysis). FSL uses a default screening threshold of Z>2.3 (only voxels with z-values higher than 2.3 are used). It is important to know that our measure of power is conditional, which means that our measure of power computes the average chance of detecting an active peak for all peaks above the screening threshold. All activity below the screening threshold is ignored. As a consequence, the power estimate cannot be reported independent of the screening threshold.

Next: Data Input

The tab will show you the result of the fit of the model. What we are aiming for is finding the alternative distribution of peaks. How to do this?

1. Estimate the mean and the standard deviation of the alternative distribution

Background: We assume that the total distribution of peaks is a mixture of two different distributions:

Null distribution (green line): following Random Field Theory (Worsley, 2007), we assume that the distribution follows an exponential distribution with mean u+1/u for screening threshold u.
Alternative distribution (red line): we assume the peaks follow a normal distribution with parameters μ and σ. However, because of the screening threshold u, the normal distribution is truncated at u. This slightly changes the form but not the parameters of the distribution.

There are three unknown parameters in this mixture distribution: μ, σ and π₁. μ and σ are the parameters of the alternative distribution. π₁ refers to the weights between the null and the alternative distribution. We estimate π₁ in a separate step (see step 2), and μ and σ are estimated using maximum likelihood.

In this example: The light blue histogram shows the observed distribution of the peaks. We see that a lot of the peaks are close to 2.3 (the screening threshold) as is expected for the null peaks. But you can also see the alternative distribution: a bell shaped distribution with the mean around 4.1. The estimated null distribution is shown with the green line, the estimated alternative distribution is given by the red line. The total distribution is the blue line. δ₁ refers to the standardised mean of the alternative distribution and can be interpreted in units of cohen's D. In other words, the estimated effect is large in this dataset.

How to evaluate the fit? Our model strives to find a good fit of the (estimated) blue line with the total observed distribution (light blue). This is a decent match and as such we assume a good estimation of the alternative distribution.

2. Estimate π₁

Background: We want to estimate the weights of the null and the alternative distribution. This translates to the percentage of the peaks that are located in brain regions that show task-related activity. To estimate π₁, we use the method of Pound and Morris (2003). We look at the distribution of uncorrected p-values (light blue histogram on figure). We assume that this distribution is a mixture of two different distributions:

Null distribution (green line): The p-values of peaks that are in non-task-activated brain regions follow a uniform distribution by definition of p-values.
Alternative distribution (red line): We model the p-values of activated peaks as a beta-distribution. We assume that contrary to the null distribution, the alternative distribution will be a lot higher closer to 0. This because we assume that the p-values from active peaks are small. We use the beta-distribution because of its flexibility to model any shape of distributions with values between 0 and 1.

In this mixture model, there are two unknown parameters: π₁ and a shape parameter for the beta-distribution. With maximum likelihood, we choose those parameters that can describe best the distribution.

In this example: We can clearly see that a lot of p-values have very small values. This reflects the presence of activation in the data. But how much exactly?
Our model estimates that the proportion of the distribution that is flat is 48%. This is presented by a green line in the figure. The height of the green line is 0.48.
The other 52% of the distribution is assumed to be from peaks in task-activated brain regions. The total distribution as fitted by our model is shown by the red line.

How to evaluate the fit? Our model again strives to a good fit of the (estimated) red line with the total observed distribution (light blue). It is normal that there are some deviations. For example in this case the green line does not fit the distribution well between 0.8 and 1. However, in general, the red line fits well the data and therefore we assume a good estimation of π₁.

Previous: Extract Peaks Next: Power Analysis