Introduction
Intelligence describes an individual’s ability to understand complex ideas, to adapt effectively to the environment, to learn from experience, and to engage in various forms of reasoning (Neisser et al.
1996). It is the best predictor of educational and occupational success (Neisser et al.
1996), relates closely to positive life outcomes like health and longevity (Deary et al.
2004), and is often defined as the general cognitive ability of a person. Understanding the neurobiological basis of intelligence is an important aim of ongoing research in the cognitive neurosciences.
By far the best-established neuroanatomical predictor of general intelligence is total brain size, accounting for up to 5% of variance in individuals’ intelligence quotients (Nave et al.
2018; Pietschnig et al.
2015). It has also been hypothesized that different brain regions may contribute differently to intelligence. For example, an influential model of the brain bases of intelligence, the parieto-frontal integration theory (P-FIT; Jung and Haier
2007) proposed that frontal and parietal cortices represent primary neural systems underlying inter-individual variation in general cognitive ability. Voxel-based morphometric methods (VBM; see, e.g., Ashburner and Friston
2000) have been used to examine the relationship between regionally specific differences in gray matter volume and intelligence at high spatial resolution (i.e., up to 1 mm), and early VBM studies (e.g., Haier et al.
2004) indeed support proposal role of parietal and frontal cortices for general intelligence. A recent coordinate-based quantitative meta-analysis of VBM studies from our research group, however, found only limited evidence for convergence of gray matter volume correlates of intelligence in parietal or frontal cortex across different studies (i.e., only very small clusters, no effects in lateral parietal cortex, and only when using rather lenient statistical thresholds; cf. Basten et al.
2015). The lack of consistent VBM findings may result from the widespread use of rather limited sample sizes (i.e., between 30 and 104 participants in studies included in the meta-analysis of Basten et al.
2015), and this situation is further complicated by the fact that not all VBM studies of regional gray matter correlates of intelligence differences controlled for the effect of individual differences in total brain size (see, e.g., Lee et al.
2005, as an example of a VBM study based on uncorrected gray matter volume data). Because total brain size is positively correlated with intelligence (Nave et al.
2018; Pietschnig et al.
2015), it is quite plausible to assume that also region-specific absolute gray matter volumes (approximating regional neuron numbers; Leuba and Kraftsik
1994) are associated with variations in intelligence. However, whether relative gray matter volumes, i.e., local deviations in gray matter volume beyond the global influence of total brain size, are correlated with intelligence is still an open question.
Additionally, all studies reviewed in our meta-analysis (as well as further studies not included in the meta-analysis due to, e.g., missing coordinates for effect localization) used an explanatory strategy in their statistical analysis approach. Such a strategy is prone to overfitting because statistical models are optimized to explain maximal amounts of variance within the respective samples but do not necessarily generalize to new out-of-sample data (see, e.g., Yarkoni and Westfall
2017, for an in-depth discussion). The introduction of predictive machine learning approaches to the field of neuroimaging (see, e.g., Lemm et al.
2011; Poldrack et al.
2020) has made it possible to explicitly test whether and to what extent neural features can predict a behavioral outcome measure (such as IQ), i.e., explain variance also in independent data. These predictive approaches - that include some form of cross-validation (i.e., an internal replication) - provide a less biased estimate of the generalization error, which reflects the extent to which associations are only valid in one specific sample but cannot be generalized to the population (Hastie et al.
2009; Yarkoni and Westfall
2017). Using such a predictive analysis approach, it has, for example, recently been demonstrated that individual differences in intelligence can be predicted from intrinsic (i.e., task independent) patterns of whole-brain functional connectivity based on resting-state fMRI, accounting for up to 25% of variation in behavioral measures of general cognitive ability (Dubois et al.
2018; Ferguson et al.
2017; Finn et al.
2015; Liu et al.
2018).
Here, we use predictive modeling to investigate whether individual intelligence scores can be predicted from regional differences in gray matter volume. To this end, we fit a cross-validated predictive model to voxel-based morphometric maps of gray matter volume using data from 308 adults whose Full-Scale Intelligence Quotient (FSIQ) was assessed with the Wechsler Abbreviated Scale of Intelligence (WASI; Wechsler
1999). On the one hand, this analysis was conducted after correcting for individual variations in total brain size (i.e., on relative regional gray matter volume data) to assess region-specific neuroanatomical correlates of intelligence beyond the known correlation between intelligence and total brain size. On the other hand, we also assessed whether intelligence can be predicted from regional gray matter volumes when not correcting for total brain size (i.e., from absolute gray matter volumes), to test the influence of total brain size on the prediction of intelligence from regional gray matter differences. As there exists no general consensus on how to best construct meaningful features from the very high-dimensional voxel-wise neuroimaging data, we implemented two different approaches of feature construction and compared the respective results: We started with a well-established and purely data-driven method, i.e., principal component analyses (PCA, see e.g., Abreu et al.
2019; Espinoza et al.
2019; Wasmuht et al.
2018). In addition, we implemented a more theoretically informed, domain knowledge-based approach, which combines voxel-specific gray matter values in regions of interest in accordance with a well-established functional brain atlas (Schaefer et al.
2018).
Beyond whole-brain prediction, it is also of interest to assess the predictive power of functionally defined brain networks for intelligence. This not only directly follows from neurocognitive models of intelligence like the parieto-frontal integration theory (Jung and Haier
2007) but is also motivated by more recent proposals highlighting the potential role that specific brain networks may play for general intelligence (Barbey
2018). Functional neuroimaging work has firmly established a set of functionally defined cortical networks (reviewed, e.g., in Dosenbach et al.
2006; Sporns and Betzel
2016; Yeo et al.
2011), and individual differences in intelligence have been associated with the fronto-parietal network (e.g., Barbey
2018; Hearne et al.
2016; Santarnecchi et al.
2017), the dorsal attention network centered on the intraparietal sulcus and the frontal eye fields (e.g., Hilger et al.
2020; Santarnecchi et al.
2017), the cingulo-opercular salience network (Barbey
2018; Hilger et al.
2017a,
b; Santarnecchi et al.
2017), and the default mode network of the brain (Barbey
2018; Basten et al.
2013; Hearne et al.
2016; van den Heuvel et al.
2009). While recent correlative studies with large sample sizes indeed suggest associations with structural white matter connectivity (Genç et al.
2018) and with local gyrification (Gregory et al.
2016) in some of these systems, the role of network-specific individual differences in gray matter volume for intelligence has so far not been systematically explored. To fill this gap, we conducted all predictive analyses also independently for a set of well-defined functional brain networks.
Discussion
We used two different cross-validated predictive modeling approaches to test whether individual intelligence scores can be predicted from regional brain gray matter volume - beyond the known relationship between intelligence and total brain size (Nave et al.
2018; Pietschnig et al.
2015). Predictive performance of a whole-brain model based on relative gray matter volume was not significantly above chance when using a PCA-based feature construction approach, but reached statistical significance when features were derived from an established functional brain atlas parcellation. Nevertheless, independent of the analysis approach, predictive performance was low in terms of the correlation between predicted and observed IQ scores (
r = 0.11 in both cases), and the absolute difference between predicted and observed scores varied between 11 and 14 IQ points. The same analyses with absolute gray matter volumes, i.e., without correcting for total brain size, yielded significant prediction in both cases and provided higher correlations between observed and predicted IQ scores (
r = 0.24 and
r = 0.30). However, the MAEs remained nearly unchanged (around 11 IQ points). Brain network-specific analyses of relative gray matter volumes resulted in significant predictive performance only for the cerebellum in the PCA-based approach and only for the fronto-parietal network with the atlas-based method. Network-specific prediction from absolute gray matter was not above chance in the PCA-based approach, but provided significant predictions for all networks with the atlas-based method. However, independent of statistical significance, the MAE remained between 11 and 14 IQ points in all network-specific analyses. Critically, and in all cases, the predictive performance in terms of absolute error did not differ in any substantial way from a ‘dummy’ predictive model based on the sample mean - an observation that calls into question the practical value also of those results that reached statistical significance.
To summarize these results, we observed (a) variable results for whole-brain predictive models in terms of statistical significance, with relative gray matter allowing for significant prediction only with the atlas-based feature construction method, while absolute gray matter provided significant predictions with both approaches. We found (b) heterogeneous results with respect to network-specific prediction performance, providing no support for models of gray matter volume and intelligence that focus on only specific regions of the brain. Finally, our results (c) indicate a high absolute error of prediction, which suggests limited practical value of machine learning models predicting general intelligence from patterns of regional gray matter volume. In the following, we will discuss the role of region-specific adaptations of gray matter volume for general intelligence, the separable contributions of relative vs. absolute gray matter volume, conclusions that can be drawn from the network-specific analyses, as well as limitations of the present investigation. Finally, we discuss suggestions and recommendations for future investigations applying predictive modeling approaches to the study of phenotypic variations.
Predicting intelligence from region-specific variations in relative gray matter volume
Recent evidence suggests that individual intelligence scores can be predicted from functional (resting-state) connectivity (Dubois et al.
2018; Ferguson et al.
2017; Finn et al.
2015; Liu et al.
2018). An earlier study also provided initial evidence for the feasibility of predicting intelligence from brain structure, in that case, based on a combination of various morphometric features (Yang et al.
2013). In the current study, we tested explicitly the predictive performance of one of the most commonly studied structural correlates of intelligence, regional gray matter volume, but found only limited evidence for above-chance prediction of individual intelligence scores when controlling for individual differences in total brain size. This finding is consistent with the results of a very recent machine learning competition which aimed at predicting intelligence in a large cohort of 8669 healthy children from brain structure operationalized by several MRI brain morphological metrics including absolute and relative gray matter volume (ABCD Neurocognitive Prediction Challenge). The final model of that competition did not succeed in significantly predicting intelligence and resulted in only a low correlation of
r = 0.03 between predicted and observed IQ scores (Mihalik et al.
2019). This study differs from the present work not only regarding the age range of the sample, the broader set of features used for prediction, but also with respect to the to-be-predicted target variable. The intelligence scores provided by the ABCD challenge were estimated from performance in cognitive tasks of the NIH Toolbox Neurocognitive battery (Akshoomoff et al.
2013) but, critically, the resulting scores were residualized with respect to several variables known to be strongly correlated with intelligence, such as highest parental education (e.g., von Stumm and Plomin
2015). Given these differences, it is not clear how directly the two studies can be compared. Nevertheless, they converge in the sense that both studies fail in precisely predicting general intelligence from morphometric patterns of brain anatomy.
The results of the present study also allow for conclusions concerning the heterogeneity of previous structural VBM findings (as also indicated, for example, by the relatively weak meta-analytic effects observed in Basten et al.
2015). Specifically, our present data suggest that some of the previous VBM results (in studies with smaller sample sizes than in the current study) may have been driven primarily by sample-specific variance and may thus not generalize to independent and previously unseen data. Using a predictive rather than an explanatory statistical approach, and by exploring two different feature construction methods, we found no evidence in support of a strong relationship between relative regional gray matter volume and general intelligence. Further, our analyses revealed that even for those three models for which prediction performance was significantly above chance (i.e., the cerebellum model in the PCA-based approach; the whole-brain model and the fronto-parietal model in the atlas-based approach), the average absolute error we would make when predicting intelligence scores of individual persons would be too high for actual applications (i.e., between ten and 14 IQ points).
The practical relevance of an error of around ten to 14 IQ points can be illustrated by considering the impact that a difference of that magnitude may have on critical decisions with long-term consequences, e.g., with respect to whether or not someone is eligible for receiving specific support (like for children with very low or very high cognitive abilities). In this regard, it is also interesting to note that the average effect of 1 year of secondary schooling in adolescence on later IQ has been estimated at between three (Falch and Sandgren Massih
2011) and five (Brinch and Galloway
2012) IQ points. A difference of ten IQ points, thus, may amount to the effect of 2 to 3 years of schooling on IQ, and a prediction error in that range can, therefore, have severe consequences in actual selection or placement decisions.
The visualization of our PCA-based results shows that prediction performance varies across the range of possible IQ scores, with higher prediction accuracies close to the mean and larger errors in the extreme tails of the distribution. This is visible from the confidence interval of prediction accuracy, which is highlighted as a gray area around the regression lines in Fig.
3a, and results primarily from the fact that intelligence is approximately normally distributed in our sample implying that there are more data points available around the mean IQ of 100. The model can thus be ‘better’ trained and generate more accurate predictions within that range - the more instances (of intelligence–gray matter associations) are available within a certain range, the more opportunities the algorithm has to learn these associations and to capture also fine-grained deviations. In contrast, the visualization of atlas-based results (Fig.
4a) indicates a very restricted range of predicted IQ scores (87–99 IQ points) with heavy clustering in a narrow range close to the sample mean. This may result from the fact that the mean represents the maximum-likelihood estimation, which can drive the prediction algorithm and lead to predicted values close to the mean when there is no other relevant pattern found in the data. As in the atlas-based approach, the fold-specific variance is naturally reduced due to common features for all subjects (400 atlas parcels). This pattern becomes especially visible in this method and highlights the limited presence of relevant information in the data after applying the parcellation. The latter point receives further support from our observation of comparable predictive performance when strictly using the group-mean IQ as predicted score for all participants (see “Additional control analyses”: ‘dummy model’). Thus, the difference in the statistical significance of prediction results obtained for the atlas-based prediction models in contrast to the PCA-based models on relative gray matter volumes may primarily result from an over-representation of IQ values around the sample mean (due to normally distributed IQ scores) that, due to the algorithm’s tendency to use the sample mean as best predictor when no other relevant information is available, lead to reduced variance between folds and thus an increased likelihood of statistical significance.
However, it is important to note that this does not mean that the significance of results is artificial, but that PCA- and atlas-based approach are differentially dependent on fold-specific variability. It may thus be more a theoretical decision whether one prefers an approach that relies purely on the given input data (PCA) or an approach that is informed by domain-specific knowledge. For instance, in cases where no prior assumptions about the underlying data structure exist or where the (arbitrary) choice of a specific brain atlas should be prevented, a purely data-driven approach would represent the preferred method. However, a purely data-driven approach can also increase the generalization error and induce fold-specific variance (since the model is fitted to the training set and might overfit). This can especially be the case when samples are small (< 1000) in relation to the high-dimensional input data, as it is mostly the case in human neuroimaging studies. In contrast, a domain knowledge-based approach introduces a priori assumptions (that may or may not be correct) and will therefore less likely overfit to the training data. This can reduce fold-specific variance and minimize the generalization error, but respective prediction models can only generalize to data of the same structure, i.e., MRI data that are preprocessed in the same way and parcellated with the same atlas. This trade-off between generalizability and accuracy has to be considered thoroughly when selecting the feature construction method.
Additionally, the pattern of our results suggests that test statistics like the MAE, which can be interpreted in terms of absolute IQ points, are of obvious informative value. To the best of our knowledge, such measures have not been considered as criteria for model evaluation in previous studies that reported successful prediction of intelligence from task-induced activation (Sripada et al.
2018) or intrinsic connectivity (Dubois et al.
2018; Ferguson et al.
2017; Finn et al.
2015; Liu et al.
2018), which impedes the direct comparability of our results to these former studies. However, a similar restriction of the variance of predicted intelligence around the mean, as observed in our study, is also present, for example, in the significant prediction results of Finn et al. (
2015; see their Fig. 5a, c) and Dubois et al. (
2018; see their Fig. 3a). Error measures like the MSE or the MAE yield important additional insights into the practical relevance of prediction-based neuroimaging studies, and we would, therefore, advocate their use in future studies.
Relative vs. absolute gray matter volume and their relevance for general intelligence
In contrast to the mixed results obtained in respect to the whole-brain patterns of relative gray matter volume, whole-brain patterns of absolute gray matter volume provided statistically significant predictions of intelligence irrespective of the specific feature construction method - albeit again with a rather high MAE of (around 11 IQ points) and with a highly restricted ranged of predicted values in the atlas-based models. This may suggest that regional differences in gray matter volume do contribute some but not much information beyond total brain size. Importantly, however, the differences in predictive performance between models based on relative vs. absolute gray matter volume were not statistically significant - neither for the global models nor for any of the local models and neither in the PCA-based nor in the atlas-based approach, rendering such conclusions preliminary. Nevertheless, our result underscores the importance of differentiating thoroughly between relative and absolute gray matter and to compare respective effects, particularly given that the variable of interest (IQ) is significantly related to brain size (McDaniel et al.
2005; Nave et al.
2018; Pietschnig et al.
2015). It is not absolutely clear what neurobiological characteristics are primarily reflected in gray matter probability maps as derived from VBM: More cell bodies, neutrophil, glia cells, synapses, and capillaries all seem to be related to higher gray matter values, but also more cortex folding and thicker gray matter can contribute to high gray matter indices (Mechelli et al.
2005). Most often, however, gray matter values are interpreted as reflecting the total amount of neuronal packing within a certain region, i.e., an approximation of neuron number (Gaser and Kurth
2018). Variations in total brain size are thus likely to reflect individual differences in total neuron numbers (e.g., Leuba et al. 1994; Pakkenberg and Gundersen
1997) and positive associations with intelligence are typically interpreted as indicating more computational processing power due to larger neural capacities (e.g., in Genç et al.
2018). The results of our analyses of absolute gray matter volumes are well in line with this proposal and extend it in suggesting that this positive association, i.e., between higher intelligence and more computational power due to more neurons, may exist in all functional brain networks. In contrast, relative gray matter volume reflects local deviations in neuron number that goes beyond the neuron number that one would expect for a given region on the basis of an individual’s brain size. The low predictive performance of relative gray matter models observed in our study suggests only a minor influence of these deviations (beyond brain size) on individual differences in intelligence. Overall, our results are more in support of theories proposing intelligence as a result of a global processing advantage, rather than theories of intelligence focusing on region-specific gray matter characteristics.
Differences in predictive performance between functional brain networks
Our results of the network-specific (local) analyses of relative gray matter volume demonstrate that even when restricting the number of features by separately modeling distinct functional brain networks, only two sub-systems could predict intelligence significantly above chance, i.e., the cerebellum in the PCA-based approach and the fronto-parietal network in the atlas-based method. The observation that frontal and parietal brain regions are more closely related to individual differences in intelligence than other regions is well in line with previous observations and neurocognitive theories of intelligence (e.g., P-FIT model, Basten et al.
2015; Jung and Haier
2007; Multiple-Demand System, Duncan
2010), while the cerebellum has typically not been considered as relevant for individual differences in intelligence. Contrasting these network-specific differences in the predictability of intelligence from relative gray matter volume, the local models based on absolute gray matter did not differ between each other in respect to their significance: While none of the network models approached significance in the PCA-based approach, all models provided above-chance predictions with the atlas-based method. Critically, however, in all local models, the MAE was comparably high (i.e., between ten and 12 IQ points). As already discussed for the global models, this observation limits the impact of network-specific differences in gray matter volume for the understanding and prediction of general intelligence.
The currently available evidence from prediction-based studies, thus, seems to suggest that brain function (i.e., resting-state functional connectivity or task-induced brain activation) may be more important than brain structure in determining individual differences in general cognitive ability - at least when operationalizing brain structure exclusively as regional gray matter volume differences. Highest prediction accuracies have so far been reported with respect to intrinsic functional connectivity, i.e., correlated neural activation patterns measured in the absence of any task demand (Dubois et al.
2018; Ferguson et al.
2017; Finn et al.
2015; but note also Greene et al.
2018 for task-based prediction models). As the organization of intrinsic brain networks is assumed to be closely related to the underlying anatomical connectivity backbone, i.e., the strongest structural connections between different brain regions (Greicius et al.
2009), we speculate that measures of structural connectivity (as assessed, e.g., with diffusion tensor imaging) may allow for a more accurate prediction of general intelligence than volumetric indices of regional gray matter volume (for correlative support of this assumption, see, e.g., Genç et al.
2018). On the other hand, intelligence has also been linked to other regionally specific morphometric properties of the brain such as cortical surface area (e.g., Schnack et al.
2014), gyrification (e.g., Gregory et al.
2016), or cortical thickness (e.g., Karama et al.
2011). Future predictive work, in our view, should thus aim at more strongly integrating the different functional and neuroanatomical characteristics of the brain, to better understand their respective roles for general cognitive abilities.
Limitations
The machine learning pipeline of the present study used a support vector regression with a linear kernel. This limited our analyses to the detection of linear relationships between intelligence and brain structure. Although this approach is one of the most widely used in the field of neuroimaging (for review, see Lemm et al.
2011; Pereira et al.
2009), the possible existence of non-linear associations cannot be excluded. However, our selection of this approach was driven a) by computational feasibility (the reported analyses took an equivalent of ~ 36,000 h of computation time with 2 CPU kernels and 5 GB RAM; non-linear analyses would take substantially longer) and b) by our aim of reaching highest comparability with previous correlative analyses on brain structure and intelligence (from explanatory studies, see above).
Second, our results revealed considerable variance in predictive performance across the ten folds of the cross-validation procedure, despite our efforts to homogenize the distributions of the target variable (IQ) between folds. This was particularly severe in the PCA-based approach, but also obvious in models that relied on the atlas-informed feature construction method. A systematic investigation of the heterogeneity in prediction performance across folds could be achieved, e.g., by repeating all analyses 100 times and then examining differences between resulting distributions of prediction accuracies. This is, however, at present not computationally feasible. To the best of our knowledge, the variability of results across folds has not been addressed in detail by previous machine learning-based neuroimaging investigations and our study is one of the first to illustrate fold-specific predictive performances at all. In our opinion, this observation deserves closer consideration in future research and we, therefore, recommend reporting (in addition to overall predictive performance) always also fold-specific measures of predictive performance.
Finally, for predictive modeling approaches like the one used in the present study, the use of many data points is essential to train the prediction models sufficiently and to gain stable prediction weights. Of note, it has been observed that prediction accuracies increase as sample size decreases (Varoquaux
2017), suggesting the presence of unrealistically exaggerated (and thus invalid) prediction accuracies in studies using small samples. Although our sample size can be considered large relative to other prediction studies from recent years (for comparison of prediction-based neuroimaging studies, see, e.g., Arbabshirani et al.
2017; Poldrack et al.
2020), it nevertheless appears small given the dimensionality of the original feature space (i.e., the number of voxels in the brain). We thus propose that future work should strive to further increase sample sizes, for example by combining data from different sources (as is done in genetics; e.g., Savage et al.
2018).
Methodological implications and recommendations for future studies
In light of the results presented in this work, we would like to summarize methodological insights that may be valuable to consider in future predictive studies, within the field of intelligence research but also more generally in individual differences-focused predictive modeling investigations. First, whenever cross-validation is used to assess the performance and generalizability of the predictive model, some measure or visualization of the variance across folds should be reported. Second, predictive variance within folds should be visualized using scatter plots so that the range of the predicted scores becomes transparent. This is especially important for detecting cases in which predicted and true scores correlate highly despite a restricted range of predicted values, indicating poor practical utility of those predictions. Third, pertaining to the same point, measures of the absolute difference between predicted and true values such as RMSE or MAE should be used in addition to the correlation between predicted and observed scores or explained variance. These metrics quantify the error in units of the original scale and are therefore of high value for interpretation. Correlations, on the other hand, are insensitive to the scaling of the original measures, which can lead to high correlations between predicted and observed scores despite considerable differences in their absolute values (see also Poldrack et al.
2020, for an in-depth discussion). Fourth, a comparison of model performance indices with those obtained by a non-informative, ‘baseline’ solution (such as predicting the mean of the training set for all subjects of the test set) can help in interpreting resulting performance measures. Fifth, our results indicate that purely data-driven methods of feature construction (such as PCA) can lead to different results than methods using features informed by domain-specific knowledge (such as using a functionally defined brain atlas). Similar variations in results have been observed for the application of different algorithms and other data transformations (Wolpert and Macready
1997). We therefore recommend to explore the influence that variations in analysis pipelines, such as different feature construction methods, may have on the results, and to report respective observations in detail to achieve a more realistic understanding about the robustness and generalizability of respective findings. In subsequent stages of a research program, such parameters should be defined prior to the data analysis or optimized in a purely data-driven way (within a further inner cross-validation loop), to reduce researcher degrees of freedom and to move from exploratory to more confirmatory research.
The current study used a machine learning-based predictive modeling approach to test whether individual intelligence scores can be predicted from spatially highly resolved (i.e., voxel wise) patterns of regional gray matter volume. When analyzing relative gray matter volumes, i.e., independent of total brain size, predictive performance for the whole-brain model was generally low and reached statistical significance only with a domain knowledge-based feature construction approach (using a common brain atlas) but not with a purely data-driven method (PCA). In contrast, absolute gray matter volume (uncorrected for brain size) allowed for significant predictions of individual intelligence scores with both feature construction approaches. Importantly, the absolute error was relatively high (greater than ten IQ points) and the range of predicted IQ scores was markedly restricted around the sample mean, limiting the practical value of these findings. Brain network-specific analyses of gray matter volume highlight the role of the fronto-parietal network and the cerebellum, but could not reduce the MAE in comparison to the global models. Overall, our results suggest (a) that absolute gray matter volume is a significant predictor of individual differences in intelligence and that this generalizes across functional brain networks, (b) that regional differences that go beyond the influence of brain size (relative gray matter volume) contribute some but not much additional information to this prediction, and (c) that the empirical evidence in favor of region or network-specific gray matter models of intelligence is limited. This supports the proposal that intelligence may be related to global more than region-specific variations in gray matter volume. The difference between our result and earlier reports of significant correlative associations between intelligence and gray matter volume underscores the importance of predictive as opposed to explanatory approaches in the cognitive neurosciences. To be able to unequivocally establish brain–behavior associations, individual difference-oriented neuroimaging studies should strive for true out-of-sample prediction in independent data.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.