statistics-surfstat - Surface-based mass-univariate analysis with SurfStatLink

This command performs statistical analysis (e.g. group comparison, correlation) on surface-based features using the General Linear Model (GLM). To that aim, the pipeline relies on the Matlab toolbox SurfStat designed for statistical analyses of univariate and multivariate surface and volumetric data using the GLM [Worsley et al., 2009].

Surface-based measurements are analyzed on the FsAverage surface template (from FreeSurfer).

Currently, this pipeline only handles cortical thickness measurements. These can be generated using the t1-freesurfer pipeline. Other measurements will be handled in the future..

Note

We are aware that the SurfStat toolbox is not maintained anymore. The reasons why we rely on it are: 1) its great flexibility; 2) our profound admiration for the late Keith Worsley.

PrerequisitesLink

You need to process your data with the t1-freesurfer pipeline.

DependenciesLink

If you only installed the core of Clinica, this pipeline needs the installation of Matlab and FreeSurfer on your computer. You can find how to install these software packages on the installation page.

Running the pipelineLink

Mandatory InputsLink

The pipeline can be run with the following command line:

clinica run statistics-surfstat caps_directory tsv_file design_matrix contrast str_format group_label glm_type
where:

  • caps_directory is the folder containing the results of the t1-freesurfer pipeline and the output of the present command, both in a CAPS hierarchy.
  • tsv_file is a TSV file containing a list of subjects with their sessions and all the covariates and factors in your model (the content of the file is explained on the Example subsection).
  • design_matrix is a string defining the model that fits into the GLM, e.g. 1 + group + sex + age where group, sex and age correspond to the names of columns in the TSV file provided.
  • contrast is a string defining the contrast matrix for the GLM, e.g. group or age.
  • str_format is a string defining the string format for the TSV column, e.g. %s %s %s %f.
  • group_label is a string defining the group label for the current analysis which helps you keep track of different analyses.
  • glm_type is a string defining the type of analysis of your model, choose one between group_comparison and correlation.

Tip

Check the Example subsection for further clarification.

Specifying what surface data to useLink

If you run the help command line clinica run statistics-surfstat -h, you will find 2 optional flags that we will describe :

  • --feature_type FEATURE_TYPE allows you to decide what feature type to take for your analysis. If it is cortical_thickness (default value), the thickness file for each hemisphere and each subject and session of the tsv file will be used. Keep in mind that those thickness files are generated using the t1-freesurfer pipeline, so be sure to have run it before using it ! The other directly-implemented solution is pet_fdg_projection, supporting the use of the yet unreleased pet_cortical_projection pipeline.
  • The other flag custom_file CUSTOM_FILE allows to specify yourself what file should be taken in the CAPS/subjects directory. CUSTOM_FILE is a string describing the folder hierarchy to find the file. For instance, let's say we want to manually indicate to use the cortical thickness. Here is the generic link to the surface data files.

CAPS/subjects/sub-*/ses-M*/t1/freesurfer-cross-sectional/sub-*_ses-M*/surf/*h.thickness.fwhm*.fsaverage.mgh

(Example : CAPS/subjects/sub-ADNI011S4075/ses-M00/t1/freesurfer-cross-sectional/sub-ADNI011S4075_ses-M00/surf/lh.thickness.fwhm15.fsaverage.mgh)

Note that the file must be in the CAPS/subjects directory. So my CUSTOM_STRING must only describe the path starting after the subjects folder. So now, we just need to replace the * by the correct keywords, in order for the pipeline to catch the correct filenames. @subjects is the subject, @session the session, @hemi the hemisphere, @fwhm the full width at half maximum. All those variables are already known, you just need to indicate where they are in the filename !

As a result, we will get for CUSTOM_FILE of cortical thickness : @subject/@session/t1/freesurfer-cross-sectional/@subject_@session/surf/@hemi.thickness.fwhm@fwhm.fsaverage.mgh

Note that --custom_file and --feature_type cannot be combined.

OutputsLink

Group comparison analysisLink

Results are stored in the following folder of the CAPS hierarchy: groups/group-<group_label>/statistics/surfstat_group_comparison/.

The main outputs for the group comparison are:

  • group-<group_label>_<group_1>-lt-<group_2>_measure-ct_fwhm-<label>_correctedPValue.jpg: contains both the cluster level and the vertex level corrected p-value maps, based on random field theory.
  • group-<group_label>_<group_1>-lt-<group_2>_measure-ct_fwhm-<label>_FDR.jpg: contains corrected p-value maps, based on the false discovery rate (FDR).
  • group-<group_label>_participants.tsv is a copy of the tsv_file parameter.
  • group-<group_label>_glm.json is a JSON file containing all the model information of the analysis (i.e. what you wrote on the command line).

The <group_1>-lt-<group_2> means that the tested hypothesis is: "the measurement of <group_1> is lower than (lt) the measurement of <group_2>". The pipeline includes both contrasts so *<group_2>-lt-<group_1>* files are also saved.

The value for FWHM corresponds to the size of the surface-based smoothing in mm and can be 5, 10, 15, 20.

The full list of output files can be found in the ClinicA Processed Structure (CAPS) Specification.

Note

Currently, only analysis with cortical thickness (_measure-ct keyword) is provided but we aim to integrate other features in the future.

Tip

See the Example subsection for further clarification.

Correlations analysisLink

Results are stored in the following folder of the CAPS hierarchy: groups/group-<group_label>/statistics/surfstat_correlation/.

The main outputs for the correlation are:

  • group-<group_label>_correlation-<label>_contrast-<label>_measure-ct_fwhm-<label>_correctedPValue.jpg: contains both the cluster level and the vertex level corrected p-value maps, based on random field theory.
  • group-<group_label>_correlation-<label>_contrast-<label>_measure-ct_fwhm-<label>_FDR.jpg: contains corrected p-value maps, based on the false discovery rate (FDR).
  • group-<group_label>_correlation-<label>_contrast-<label>_measure-ct_fwhm-<label>_T-statistics.jpg: contains the maps of T statistics.
  • group-<group_label>_correlation-<label>_contrast-<label>_measure-ct_fwhm-<label>_Uncorrected p-value.jpg: contains the maps of uncorrected p-values.
  • group-<group_label>_participants.tsv is a copy of tsv_file.
  • group-<group_label>_glm.json is a JSON file containing all the model information of the analysis.

The correlation-<label> here describes the factor of the model which can be for example age. The contrast-<label> is the sign of your factor which can be negative or positive.

The full list of output files can be found in the ClinicA Processed Structure (CAPS) Specification.

Note

Currently, only analysis with cortical thickness (_measure-ct keyword) is provided but we aim to integrate other features in the future.

ExampleLink

Let's assume that you want to perform a group comparison between patients with Alzheimer’s disease (group_1 will be called AD) and healthy subjects (group_2 will be called HC). ADvsHC will define the group_label.

The TSV file containing the participants and covariates will look like this:

participant_id  session_id  sex     group   age
sub-CLNC0001    ses-M00     Female  CN      71.1
sub-CLNC0002    ses-M00     Male    CN      81.3
sub-CLNC0003    ses-M00     Male    CN      75.4
sub-CLNC0004    ses-M00     Female  CN      73.9
sub-CLNC0005    ses-M00     Female  AD      64.1
sub-CLNC0006    ses-M00     Male    AD      80.1
sub-CLNC0007    ses-M00     Male    AD      78.3
sub-CLNC0008    ses-M00     Female  AD      73.2
Note that to make the display clearer, the rows contain successive tabs, which should not happen in an actual BIDS TSV file.

The format of the TSV file is %s %s %s %s %f and we call this file ADvsHC_participants.tsv.

Our linear model formula will be: CorticalThickness = 1 + age + sex + group. In this linear model, the age and sex are the covariates, and group is the contrast. Please note that all these variables should correspond to the names of the columns in the ADvsHC_participants.tsv file.

Finally, the command line is:

clinica run statistics-surfstat caps_directory ADvsHC_participants.tsv "1 + age + sex + group" “group” “%s %s %s %s %f” group group_comparison

The parameters of the command line are stored in the group-ADvsHC_glm.json file:

{
"DesignMatrix": "1 + age + sex + group"
"StringFormatTSV": "%s %f %f"
"Contrast": "group"
"ClusterThreshold": 0.001
}

The results of the group comparison between AD and HC are given by the group-ADvsHC_AD-lt-HC_measure-ct_fwhm-20_correctedPValue.jpg file and is illustrated as follows:

Visualization of corrected p-value map.

The blue area corresponds to the vertex-based corrected p-value and the yellow area represents the cluster-based corrected p-value.

Describing this pipeline in your paperLink

Example of paragraph:

Theses results have been obtained using the statistics-surfstat command of Clinica. More precisely, a point-wise, vertex-to-vertex model based on the Matlab SurfStat toolbox (http://www.math.mcgill.ca/keith/surfstat/) was used to conduct a group comparison of whole brain cortical thickness. The data were smoothed using a Gaussian kernel with a full width at half maximum (FWHM) set to <FWHM> mm. The general linear model was used to control for the effect of <covariate_1>, ... and <covariate_N>. Statistics were corrected for multiple comparisons using the random field theory for non-isotropic images [Worsley et al., 1999]. A statistical threshold of P < <ClusterThreshold> was first applied (height threshold). An extent threshold of P < 0.05 corrected for multiple comparisons was then applied at the cluster level..

AppendixLink

  • For more information about SurfStat, please check here.
  • For more information about the GLM, please check here.
  • The cortical thickness map is obtained from the FreeSurfer segmentation. More precisely, it corresponds to the subject’s map normalized onto FSAverage and smoothed using a Gaussian kernel FWHM of <fwhm> mm (the surf/?h.thickness.fwhm<fwhm>.fsaverage.mgh files).