Introduction
Neuroscience is still a young research field, with its emergence as a formal discipline happening only around 70 years ago (Cowan et al.
2000). The field has since mushroomed, and much of our current knowledge about the human brain’s neurobiology was made possible by the rapid advances in technologies to investigate the brain in vivo at high-resolution and different scales. An example is magnetic resonance imaging (MRI), which allows us to measure regional characteristics of the brain’s structure non-invasively and may also be used to assess anatomical and functional interactions between brain regions (Rosen and Savoy
2012; Sizemore et al.
2018). This expansion in the field led to an exponential increase in data size and complexity. To analyse and interpret this ‘big data’, researchers had to develop robust theoretical frameworks. Complex network science was brought to neuroscience and has been increasingly used to study the brain’s intricate communication and wiring (Bassett and Sporns
2017; Sporns
2018). The resulting field—network neuroscience—aims to see the brain through an integrative lens by mapping and modelling its elements and interactions (Bassett and Sporns
2017; Fornito et al.
2016).
One of the main theoretical frameworks from complex network science used to model, estimate, and simulate brain networks is graph theory (Gross and Yellen
2003; Bullmore and Sporns
2009). A graph is comprised of a set of interconnected elements, also known as
vertices and
edges. Vertices (also known as nodes) in a network can, for example, be brain areas, while edges (also known as links) are a representation of the functional connectivity between pairs of vertices (Sporns
2018). Various imaging modalities are available to reconstruct the brain network (Hart et al.
2016; Bullmore and Sporns
2009). The focus of this hands-on paper will be resting-state functional MRI (rsfMRI). As the name suggests, rsfMRI indirectly measures brain activity while a subject is at rest (i.e., does not perform any task). This type of data provides information about spontaneous brain functional connectivity (Raichle
2011). Functional connectivity is often operationalised by a statistical dependency (usually a Pearson correlation coefficient) between signals measured from anatomically separated brain areas (Rosen and Savoy
2012; Smith et al.
2013). An in-depth explanation of rsfMRI and functional connectivity is out of the scope of our manuscript. However, considering the focus on this type of data here, we recommend readers who are not familiar with this imaging method to read Lee et al. (
2013); van den Heuvel and Hulshoff Pol (
2010); Smith et al. (
2013); Smitha et al. (
2017) for a comprehensive overview.
Several descriptive graph metrics
1 (Do Carmo
2016) can be calculated to describe the brain network’s characteristic; examples include the degree or the total number of connections of a vertex and the path length (number of intermediate edges) between two vertices (Fornito et al.
2016; Hallquist and Hillary
2018). These metrics have consistently allowed researchers to identify non-random features of brain networks. A key example is the ground-breaking discovery that the brain (like most other real-world networks) follows a ‘small-world network’ architecture (Bassett and Bullmore
2017; Bassett and Sporns
2017; Watts and Strogatz
1998). This refers to the phenomenon that, to minimise wiring cost while simultaneously maintaining optimal efficiency and robustness against perturbation, the brain network obeys a balance between the ability to perform local processing (i.e., segregation) and combining information streams on a global level (i.e., integration).
Network neuroscience has thereby offered a comprehensive set of analytical tools to study not only the local properties of brain areas but also their significance for the entire brain network functioning. Using graph theory, many insights have been gathered on the healthy and diseased brain neurobiology (Farahani et al.
2019; Hallquist and Hillary
2018; Hart et al.
2016; Sporns
2018).
Another perspective on the characteristics of the brain network can be provided by (algebraic) topological data analysis (TDA), by analysing the interactions between a set of vertices beyond the ‘simple’ pairwise connections (i.e., higher-order interactions). With TDA, one can identify a network’s ‘shape’ and its invariant properties [i.e., coordinate and deformation invariances (Zomorodian
2005; Offroy and Duponchel
2016)]. Thus, as we will illustrate along with the manuscript, TDA often provides more robustness against noise than graph theoretical analysis (Blevins and Bassett
2020; Blevins et al.
2021), which can be a significant issue in imaging data (Sizemore et al.
2019; Liu
2016; Greve et al.
2013). Although TDA has only recently been adopted in network neuroscience (Curto and Itskov
2008; Singh et al.
2008), it has already shown exciting results on rsfMRI (Expert et al.
2019; Curto
2017). For example, group-level differences in network topology have been identified between healthy subjects that ingested psilocybin (psychedelic substance) and the placebo group (Petri et al.
2014) and between attention-deficit/hyperactivity disorder children and typically developing controls (Gracia-Tabuenca et al.
2020). A limitation of this framework is that the complexity and level of mathematical abstraction necessary to apply TDA and interpret the results might keep clinical neuroscientists without prior mathematical training from using it. Moreover, the high-order interaction structure that emerges from TDA analysis is often challenging to visualise realistically and understandably. Despite technical constraints, TDA allows us to deal with high order and large combinatorial coding capacity properly.
Therefore, we would like to facilitate the use of network neuroscience and its constituents graph theory and TDA by the general neuroscientific community by providing a step-by-step tutorial on how to compute different metrics commonly used to study brain networks and realistic high-order network plots. We offer a theoretical and experimental background of these metrics and include code blocks in each section to explain how to compute the different metrics. We also list several additional resources (Tables
1 and
2) of personal preference (and by no means complete), including a Jupyter Notebook that we created to accompany this hands-on tutorial publicly available on GitHub and Zenodo (Centeno and Santos
2021) (see Table
1, under the Jupyter Notebooks section—Notebook for network and topological analysis in neuroscience).
Table 1
List of computational resources
Jupyter Notebooks | | |
AML-days-TDA-tutorial | A set of notebooks on the theory and applications of TDA pipelines | |
DyNeuSR | Notebook on how to use Mapper—an algorithm for high dimensional dataset exploration | |
Notebook for network and topological analysis in neuroscience | Notebook on how to compute both classical and newer metrics of network and topological neuroscience | |
NI-edu | A collection of neuroimaging-related course materials developed at the University of Amsterdam covering fMRI basic concepts and methodology | |
Tutorials for Topological Data Analysis with the Gudhi Library | A collection of notebooks for the practice TDA with the Python Gudhi library | |
MATLAB toolboxes and scripts | | |
CliqueTop | A collection of MATLAB scripts for TDA | |
The brain connectivity toolbox | MATLAB toolbox for brain network analysis | |
Python packages and scripts | | |
Data visualisation | | |
DyNeuSR | “DyNeuSR is a Python visualisation library for topological representations of neuroimaging data.” | |
Nxviz | “nxviz is a graph visualisation package for NetworkX.” | |
Plotly | “Plotly’s Python graphing library makes interactive, publication-quality graphs.” | |
Graph theory | | |
Bctpy | “A direct translation to Python of the MATLAB brain connectivity toolbox.” | |
NetworkX | “A package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.” | |
TDA | | |
Dionysus | “A library for computing persistent homology. It is written in C + + , with Python bindings.” | |
Giotto | “A collection of algorithms that harbours theoretical and technological advances spanning several key disciplines, including TDA.” | |
Gudhi | “The library offers state-of-the-art data structures and algorithms to construct simplicial complexes and compute persistent homology.” | |
Scikit-TDA | “Topological Data Analysis Python libraries intended for non-topologists”. | |
Topology ToolKit | “The Topology ToolKit (TTK) is an open-source library and software collection for topological data analysis and visualisation. Written in C + + but comes with Python bindings”. | |
Table 2
List of reading resources
Key articles and books | |
Cliques and cavities in the human connectome | |
Network neuroscience | Bassett and Sporns ( 2017) |
Fundamentals of brain network analysis (the primary reference of our hands-on tutorial) | |
Graph theory approaches to functional network organisation in brain disorders: a critique for a brave new small-world | Hallquist and Hillary ( 2018) |
Topology for computing | |
The importance of the whole: Topological data analysis for the network neuroscientist | |
Editorial: Topological Neuroscience | |
What can topology tell us about the neural code? | |
Homological scaffolds of brain functional networks | |
Two’s company, three (or more) is a simplex | |
A roadmap for the computation of persistent homology | |
Networks beyond pairwise interactions: structure and dynamics | |
Clique topology reveals intrinsic geometric structure in neural correlations | |
Computational topology: an introduction | Edelsbrunner and Harer ( 2010) |
Our work differs from previous literature (Hallquist and Hillary
2018; Otter et al.
2017) since we describe the concepts central to graph theory and TDA and provide an easy-to-grasp step-by-step tutorial on how to compute these metrics using an easily accessible, open-source computer language. Furthermore, we offer new 3-D visualisations of simplicial complexes and TDA metrics in the brain that may facilitate the application and interpretation of these tools. Finally, we would like to stress that even though this tutorial focuses on rsfMRI, the main concepts and tools discussed in this paper can be extrapolated to other imaging modalities, biological or complex networks.
Since graph theory has been extensively translated for neuroscientists elsewhere, we refer the reader to the book in Fornito et al. (
2016). This tutorial mainly focused on the topics covered in chapters 3, 4, 5, and the particular sections of chapters 6, 7, 8, and 9 about assortativity, shortest paths and the characteristic path length, the clustering coefficient, and modularity. In the second part of the tutorial, we explore hands-on TDA metrics, providing a summary of both theoretical and neuroscientific aspects with the calculations used in our work. We believe that our tutorial, which is far from being exhaustive, can make this emerging branch of network and topological neuroscience accessible to the reader. The codes we provide only require the knowlege of the functional connectivity matrix. For our realistic 3-D visualisation of simplicial complexes, we only need the coordinates of the nodes of a given brain atlas. Therefore, our scripts can be adapted to different databases, image modalities, and brain atlas. A short glossary with the key terms to understand this manuscript can be found in Table
3.
Table 3
Glossary with key terms
Clique complex | A simplicial complex constituted of all cliques of a network |
Clique participation rank | The number of k-cliques in which a vertex \(i\) participates for density d |
Connectivity matrix | A square N x N matrix is used to represent connectivity between vertices |
Face | A subset of a k-simplex. For example, if the k-simplex is a 2-simplex (triangle), all edges and vertices composing this simplex are also its faces |
Filtration | A nested sequence of simplicial complexes |
Functional magnetic resonance imaging (fMRI) | The imaging technique used to measure brain activity by detecting brain blood flow changes, i.e., blood-oxygen-level-dependent (BOLD) signal |
k-clique | A subset of k vertices in an undirected graph in which all vertices are connected to each other |
k-simplex | Geometrically, it is the generalisation of the region delimited by a tetrahedron to an arbitrary dimension k, which can be done in many ways (Zomorodian 2005). In this work, a k-simplex is a complete graph of k + 1 vertices. For example, 0-simplex is a point (or vertex), 1-simplex is a line segment (or edge), 2-simplex is a triangle, and so on |
Simplicial complex | A simplicial complex K is a finite set of k-simplexes (e.g., vertices, edges, triangles, tetrahedrons, and their n-dimensional counterparts). The formal definition states that if K contains a k-simplex, then K also contains all faces of this k-simplex. Moreover, if two simplexes in K intersect, then this intersection is a face of each of them |
Discussion
This tutorial has explained some of the main metrics related to two network neuroscience branches—graph theory and TDA—providing short theoretical backgrounds and code examples accompanied by a publicly available Jupyter Notebook. We innovate by combining hands-on explanations with ready-to-use codes of these subfields and visualisations of simplicial complexes in the brain, hopefully lowering the high threshold necessary for neuroscientists to get acquainted with these new analysis methods, particularly for these new methods rsfMRI data. Here, we also innovate by providing realistic visualisation of higher-order simplices in brain networks.
Our main goal was to provide a step-by-step computational tutorial to use graph theory and TDA on brain imaging data, particularly rsfMRI, with in-depth explanations behind each metric. The core idea of applying these analysis frameworks to brain data is that both frameworks can quantitatively combine two evidently essential characteristics of the brain: the brain not only works both at a local level in specialised brain regions but also contains apparent global properties that are of importance for its functioning, which are usually investigated in isolation. As a potentially powerful fusion between localizationism and holism, graph theory and TDA concepts have already been applied in brain research. Starting with graph theory, all the metrics mentioned above have been used in the investigation of brain networks in both normal or pathological states (Eijlers et al.
2017; Garcia-Garcia et al.
2015; Wang et al.
2017; Wink
2019; Breedt et al.
2021; DeSalvo et al.
2020; Liu et al.
2012; dos Santos Siqueira et al.
2014; Yu et al.
2012; Davis et al.
2013; Suo et al.
2015). As one can identify by reading these articles, researchers often use different graph-theoretical metrics in the same study, which helps them look for alterations that might explain group differences in specific contexts (gender, age, pathology, development). This brief review and commentary (Eijlers et al.
2019) summarise some applications. Now, moving on to the newer framework of TDA in neuroscience, fewer studies have been published using rsfMRI data. Santos et al. (
2019) applied the concepts of the Euler characteristic, topological phase transitions and curvature in human brain data, to show that these transitions can be found in brain data, helping pave the way for TDA in brain data applications.
Moreover, alterations in whole-brain connectomes were identified in attention-deficit/hyperactivity disorder subjects using Betti numbers and persistent homology, complementing connectomics-related methods that aim to identify the markers of this disorder (Gracia-Tabuenca et al.
2020). A similar approach was used in an Alzheimer’s disease dataset by Kuang et al. (
2019). More considerations on how TDA can be used in brain imaging big data and resting-state functional connectivity analyses can be found in Phinyomark et al. (
2017); Petri et al. (
2014); Anderson et al. (
2018); Saggar et al. (
2018); Salch et al. (
2021); Songdechakraiwut and Chung (2020).
Notably, limitations and other relevant points should be kept in mind when working with these metrics. Firstly, it is common in network neuroscience to use null models for comparison with real data. The idea is to show that the results are different from what one would obtain by chance (or randomly). The generation and comparison with null models must be performed differently for graph theory and TDA, and it is crucial to define what propriety should be kept constant (e.g., the density of the network or degree distribution). For instance, in Viger and Latapy (
2005), if one wants to generate null models with a prescribed degree sequence. In this context, simplicial complexes built from Erdo-Renyi networks illustrated in Fig.
9 are the simplest (and by no means realistic) null models we can generate.
Nevertheless, the computation and discussion of null models are beyond this tutorial’s scope and would be an article in itself. A more in-depth discussion of null models in graph theory can be found in Fornito et al. (
2016). Please see Sect. 4 of Battiston et al. (
2020) and Blevins and Bassett (
2020) for null models in simplicial complexes.
Moreover, it is crucial to appreciate limitations in interpretation when using these metrics in connectivity-based data. Since rsfMRI data is often calculated as a temporal correlation between time series using Pearson’s correlation coefficient, a bias on the number of triangles can emerge. For example, suppose areas A and B and areas C and B are communicating and thus correlated. In that case, a correlation will be present between A and C, even if there would be no actual communication between these vertices (Zalesky et al.
2012). This can affect graph-theoretical metrics such as the clustering coefficient, with networks based on this statistical method being automatically more clustered than random models, and TDA metrics, where the impact depends on how high-order interactions are defined. The proper way to determine and infer high-order interactions in the brain is an ongoing challenge in network neuroscience. Here we simplified our approach using the cliques of a network to define our simplicial complex. For those interested in a more in-depth discussion on the topic, we recommend Sects. 1 and 3 of chapters 7 and 10, respectively, in Fornito et al. (
2016).
The use of weighted matrices can also come with caveats. As mentioned above, various metrics use the sum of weights to compute final nodal values. From that, multiple edges with low weights might have a final sum equal to a few edges with higher weights. How to deal with this limitation and distinguish between these cases is still under discussion. A possible solution was proposed by Opsahl et al. (
2010), in which the addition of a tunable parameter in the computation of centralities can allow the researcher to include the number of edges in the total sum, not only the sum of the weights.
Concerning TDA, it is essential to think about limitations in its use due to computational power. The computation of cliques falls in the clique-problem, an NP (nonpolynomial time) problem, thus listing cliques may require exponential time as the size of the cliques or networks grows (Gillis
2018; Pardalos and Xue
1994). For example, if the matrix to be analysed has 60 vertices with a maximum clique size of 23, this will correspond to
\(\sum \left(\genfrac{}{}{0pt}{}{60}{k}\right)\) for
\(k \in \left\{0,\dots , 23\right\}\) cliques, resulting in an enormous amount of time to compute all cliques. What we can do for practical applications is to limit the clique size that can be reached by the algorithm, which determines the dimension of the simplicial complex in which the brain network is represented. This arbitrary constraint implies a theoretical simplification, limiting the space or the dimensionality in which we would analyse brain data. Another issue is that, to finish TDA computations in a realistic timeframe, the researcher might need to establish a maximal threshold/density for convergence even after reducing the maximal clique size. Even though TDA approaches lead to substantial improvements in network science; apart from applications using the Mapper algorithm (Saggar et al.
2018), the limitations mentioned above contribute to losing information on the data’s shape (Stolz
2014).
Furthermore, given the early stage of TDA approaches in clinical network neuroscience, it is relevant to recognise that the neurobiological meaning of the metrics mentioned here is still limited. Further studies contrasting different neuroscientific techniques with TDA must be done to improve the understanding, in the neurobiological level, on what a topological metrics represent and how they correlate with brain functioning. However, it is already possible to use these metrics to differentiate groups (Santos et al.
2019; Gracia-Tabuenca et al.
2020), and plausible to assume that the interpretation of some classical metrics could be extrapolated to higher orders interactions. For example, the concept of the centralities using pairwise interactions is used to understand node importance and hubs, the same, in theory, could be applied to the relationships between 3 or more vertices by extending the definition of centrality from networks to simplicial complexes, as done in Hernández Serrano and Sánchez Gómez (
2020); Estrada and Ross (
2018).
Last, we would like to briefly mention more general problems in network neuroscience and brain imaging. Before applying graph theoretical or topological data analysis, one should be aware of frequent arbitrary decisions such as defining thresholds, using binary or weighted matrices, and controlling for density. Besides, one should think about the differences that arise from using particular atlases and parcellations and their influence on the findings (Wang et al.
2009; Douw et al.
2019; Fornito et al.
2016; Gracia-Tabuenca et al.
2020; Wu et al.
2019; Eickhoff et al.
2018; Bullmore and Sporns
2009). All these factors can impact how credible and reproducible the field of network neuroscience will be, inevitably influencing how appealing the metrics’ use might be to clinical practice (Douw et al.
2019).
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.