Subjects and EEG recordings
All subjects ages 4–8 years diagnosed with ASD by a specialist in child neurology, child psychiatry, or developmental pediatrics, and with an EEG obtained between 2/1/2002-4/1/2011 in the neurophysiology unit at Massachusetts General Hospital were retrospectively identified. In order to reduce variability in the ASD group, subjects diagnosed with epilepsy or found to have epileptiform activity on EEG were excluded from analysis. For control data, subjects age 4–8 years with normal EEG recordings (as defined by clinical electroencephalographers independent from this study) were retrospectively identified from recordings performed at Massachusetts General Hospital between 2/1/2002 and 4/1/2011. Clinical chart review was performed and only those children with documented normal neurodevelopment and non-epileptic events without known EEG characteristics were included in the control group for analysis. For both the ASD group and control group, neurodevelopmental status was determined from chart review of the clinical assessments just prior to or following the EEG recording. Active medications at the time of EEG include in the ASD training data, one subject was taking 0.1 mg of Clonidine and one subject was taking 20 mg of Ritalin, and in the ASD validation data one subject was taking 0.05 mg Clonodine and 0.5 mg Risperdal. Of the control subjects, no medications were taken in the training group, and in the validation group one subject was taking 50 mg of Amitriptyline, and one subject took 0.05 mg of Clonidine prior to the EEG. Twenty-seven children with ASD (25 M and 2 F) and fifty-five controls (29 M and 26 F) were included for analysis. In the ASD training group (defined below), one subject had ADHD and one reported headaches. In the ASD validation group (defined below), three subjects had ADHD - one of which had depression, and one of which had tics – while one other subject had only tics, another subject had ADD, and another subject had anxiety. Of the 55 neurotypical controls, 13 had migraines or other headache syndromes, 9 had a syncopal event, 8 had tics, 4 had anxiety, 1 had sleep apnea, 1 had breath holding spells, and 1 had essential tremor. Although formal scales of ASD severity were not used in this population, chart review of physical exam and clinical assessments was performed retrospectively by a board certified child neurologist (CJC). Using the DSM V criteria, severity was estimated as follows: in the training group of thirteen ASD subjects: eight mild, four moderate, and one severe ASD. In the validation group of fourteen ASD subjects: five mild, three moderate, and six severe ASD.
In an effort to identify a clinically feasible and relevant EEG biomarker for ASD, we utilized routine EEG recordings following standard clinical recording techniques. All children were given the same instructions prior to the evaluation, including recommendations for mild sleep deprivation (awaking the child 2–4 h prior to regular morning arousal). In our dataset, sleep was recorded in 18/27 ASD subjects and 45/55 healthy controls. In all cases with a sleep recording, sleep onset was within 40 min of the start of the EEG recording session. In all cases, the wake EEG was obtained first and a posterior dominant rhythm was obtained during a period of quiet restfulness with eyes closed. For recordings of quiet wakefulness, patients were recorded in a quiet room without active stimulation.
Recordings included electrooculogram (two channels), scalp EEG (19 Ag/AgCl electrodes placed according to the 10–20 international system: FP2, F4, C4, P4, O2, F8, T4, T6, Fz, Cz, Pz, Fp1, F3, C3, P3, O1, F7, T3, and T5) and electrocardiogram using a standard clinical recording system (Xltek, a subsidiary of Natus Medical). Signals were sampled at 200, 256, 500 or 512 Hz and stored on a local server. Analysis of the data from these subjects was performed retrospectively under protocols approved and monitored by local Institutional Review Boards according to the National Institutes of Health guidelines.
Prior to analysis, subject datasets were divided into two groups, one group for exploratory analysis and hypothesis creation and a second group for hypothesis validation. The subjects in each group were selected to preserve approximately similar age distributions in each group (Table
1). In this way, hypotheses generated in the first group were tested, and validated or disputed in the second group, thereby controlling for spurious findings due to type I error. EEG recordings were manually reviewed by an experienced electroencephalographer (CJC) and large movements and muscle artifact removed. Wake and non-rapid eye movement (NREM) sleep intervals were identified by visual analysis as per standard criteria [
53]. Only patients with at least 100 s of artifact-free EEG data were included in the exploratory group (13 ASD, 24 Control) and validation group (14 ASD, 31 Control).
Table 1
Patient demographics
ASD Training | 2 M | 2 M, 1 F | 4 M | 3 M | 1 M | 12 M, 1 F |
ASD Validation | 1 M | 2 M | 4 M | 5 M, 1 F | 1 M | 13 M, 1 F |
Control Training | 3 F | 4 M, 1 F | 5 M, 1 F | 3 M, 3 F | 1 M, 3 F | 13 M, 11 F |
Control Validation | 4 M, 2 F | 3 M, 3 F | 4 M, 1 F | 2 M, 5 F | 3 M, 4 F | 16 M, 15 F |
Table 2
Edges chosen for “mask”
ASD mask network edges | Fp1-F3/Fp1-F7; F3-C3/F7-T3; C3-P3/C4-P4; C3-P3/T3-T5; P3-O1/P4-O2; P3-O1/T5-O1; F4-C4/F8-T4; F4-C4/T4-T6; C4-P4/P4-O2; C4-P4/T4-T6; C4-P4/T6-O2; C4-P4/Cz-Pz; P4-O2/T5-O1; P4-O2/T6-O2; T5-O1/T6-O2; T4-T6/Cz-Pz |
Control mask network edges | Fp1-F3/Fp2-F4; Fp1-F3/Fp1-F7; Fp1-F3/F7-T3; Fp1-F3/Fp2-F8; Fp1-F3/F8-T4; F3-C3/F7-T3; F3-C3/T3-T5; F3-C3/Fp2-F8; C3-P3/T3-T5; P3-O1/T5-O1; Fp2-F4/Fp1-F7; Fp2-F4/F7-T3; Fp2-F4/Fp2-F8; Fp2-F4/F8-T4; F4-C4/C4-P4; C4-P4/T4-T6; C4-P4/Cz-Pz; P4-O2/T6-O2; Fp1-F7/F7-T3; Fp1-F7/Fp2-F8; Fp1-F7/F8-T4; F7-T3/Fp2-F8; Fp2-F8/F8-T4 |
Edges common to both masks | Fp1-F3/Fp1-F7; F3-C3/F7-T3; C3-P3/T3-T5; P3-O1/T5-O1; C4-P4/T4-T6; C4-P4/Cz-Pz; P4-O2/T6-O2 |
Data preprocessing for network and spectral analysis
For network analysis, the EEG data were filtered with a 3
rd order Butterworth, zero-phase filter (notch filtered at 60 Hz to remove line noise, high pass at 0.5 Hz to avoid slow drift, and low pass at 50 Hz to avoid higher-frequency line noise harmonics). Because the EEG data were selected to avoid large movements and muscle artifact, noncontiguous points occurred; we removed 0.5 s from both sides of each noncontiguous point before further analysis. Visual analysis and a simulation study (not shown) confirmed that this removal was sufficient to mitigate artifacts produced at the noncontiguous points during the filtering process. For spectral analysis, the EEG data were not filtered, but 0.5 s was removed from each noncontiguous point to maintain consistency with the network analysis. In order to optimize near-field activity and reduce electrical contamination from the physical reference, both filtered and non-filtered data were then re-referenced according to the longitudinal bipolar (‘double banana’) montage, leaving 18 bipolar signals (‘derivations’) in place of the original 19 electrode signals. This reference montage was chosen in lieu of other popular montages such as the common average or Hjorth-Laplacian references because of its effectiveness and widespread clinical usage. While the common average reference and spline Laplacian reference perform reasonably well when used with a large enough number of electrodes (e.g., 128 or more), these references are expected to perform poorly when applied to the standard, low density 10/20 electrode system (see [
54], page 295). In addition, the common average reference has been found to increase spurious coupling in some cases [
55]. In contrast, bipolar montages are considered one of the best available options to improve spatial resolution in EEG with a limited number of electrodes (see [
54], p. 291). Hjorth (or nearest-neighbor) Laplacian is closely related, however we chose the double banana montage due to its extensive use clinically.
All EEG data were then divided into non-overlapping windows of 2 s duration (windows containing concatenated data from noncontiguous time points were discarded). We use 2 s intervals to approximately maintain stationarity in the time series (which requires short epochs) while keeping sufficient data for accurate coupling estimates (which requires long epochs). Finally, we normalized the data from each electrode within each window to have zero mean. All data preprocessing and subsequent analysis were performed using custom software developed in MATLAB.
Spectral analysis procedure
For the spectral analysis of the unfiltered data, the power spectrum for each 2 s epoch was computed using the multitaper methods implemented in the Chronux toolbox [
56] with 5 tapers and a time-bandwidth product of 3, so that the resulting frequency resolution was 1.5 Hz. Frequencies below 0.5 Hz were omitted to avoid low-frequency drift in the data. For each subject this resulted in a power spectrum for each of the 18 re-referenced signals, for each 2 s epoch.
To characterize the power spectra for each patient we computed a summary statistic – the “peak alpha-ratio” – as follows (Fig.
1). First, we computed the power spectrum of each signal for each epoch of the dataset, and then averaged the power spectra across all epochs. Second, we computed the ratio of this average power between four pairs of posterior to anterior signals (Far Left: T5-O1/Fp1-F7; Medial Left: P3-O1/Fp1-F3; Medial Right: P4-O2/Fp2-F4; Far Right: T6-O2/Fp2-F8). Third, we determined the maximum value of the ratio within the alpha frequency range (8–14 Hz) for each of the four channel pairs. These four maximum back/front ratios were then averaged to produce the summary statistic, mean “peak alpha-ratio”, for each patient. We choose to compute the spectral ratio for three reasons. First, the posterior to anterior alpha gradient is one of the most widely observed EEG features in healthy controls and thus is an intuitive feature to evaluate in a disease population [
57]. In addition, this metric has been previously correlated with behavioral inhibition and sociability [
58,
59]. Second, as described in
Results, changes in power (not the ratio) between the ASD and control subjects at all electrode deviations reveal no significant differences. Third, we choose to compute the frontal/posterior ratio to normalize the spectral results of each individual subject. This choice of normalization protects against artifacts that impact the overall amplitude of voltage activity for each subject (e.g., a subject with thicker hair may be expected to have reduced electrode conductance and an overall reduction in EEG amplitude), and we expect this normalization to make the results more robust to changes in clinical settings and routine (e.g., to changes in electrode recording equipment).
Functional network inference and measures
While there are many approaches to determining functional connectivity from time series data [
60], including multiple coupling measures (e.g., linear or non-linear) and different strategies for determining network edges, we selected a simple measure of linear coupling: the cross correlation. The cross correlation is a bivariate measure of linear association between two brain regions, and serves as a basic measure of electrocortical functional connectivity [
24,
61]. We note that most linear and nonlinear measures appear to perform equally well on simulated and observed macroscopic brain voltage data [
62,
63].
Each subject possessed at least 50, 2-s epochs of data (min 57, max 1256, mean 254), which is sufficient to support stable functional network representations [
64‐
66]. To create functional networks, we follow the procedure outlined in [
67] and applied in [
64‐
66]. We briefly describe this procedure here (Fig.
1). For each patient, we create a functional network for each 2 s epoch of filtered data using the 18 derivations (signals) of data, based on the cross correlation of the data between each pair of derivations. We note that each signal in each 2 s interval is normalized by its variance (or total power) before performing the correlation analysis. Doing so reduces the differences in amplitude between signals and mitigates a potential confounding factor in the correlation analysis [
68]. In addition, we show in Results that differences in correlation between the ASD and control subjects are not accompanied by changes in the (absolute) EEG power in the 2.5-17.5 Hz range (i.e., the broad, low frequency range which dominates the correlation measure). This observation suggests that changes in EEG power (i.e., in the signal to noise ratio) do not confound the functional connectivity results, in accordance with [
68,
69]. We use the maximum absolute value of the cross correlation over time lags of ±500 ms to measure the coupling (which encompasses the duration of known neurophysiological processes and cross-cortical conduction times [
70,
71]). To assess the variability of the cross correlation across lags, we compute the average variance of the cross correlation between all derivation pairs and all 2 s epochs for a subject; this provides a common measure of variability that we apply to assess the significance of each correlation statistic (see [
67]).
For each 2 s epoch, an undirected binary functional network is inferred from these correlations based on their significance. Each node represents a derivation (e.g., channel T5 – channel O1), an edge value of 1 represents a statistically significant correlation between the two derivations, and an edge value of 0 indicates a weaker correlation. To correct for the multiple significance tests within each 2 s epoch, we use a linear step-up procedure controlling the false detection rate (FDR) with
q = 0.05. For this choice of q, 5 % of the network connections are expected to be falsely declared [
72]. This procedure results in a thresholding of the significance tests of the correlation — not of the correlation value itself — for each 2 s epoch [
67]. The networks obtained in this manner have an associated measure of uncertainty, which is the expected number of edges incorrectly declared present.
To mitigate the impact of volume conduction [
54,
61] on the functional network analysis, we identified the correlations deemed significant at zero lag, and removed these edges from the analysis. In doing so, we expect to remove both spurious correlations due to volume conduction and true correlations that occur at zero lag; in this sense, this procedure is conservative. This approach has an added benefit of reducing the effect of montage selection, whereby subtraction of signals may result in spurious coupling between derivations that share electrodes.
To assess the network structure, we apply two measures of network connectivity [
23]. The density for each network is calculated in the standard way as the number of edges detected (at non-zero lag) divided by the total number of possible edges (153 minus the number of spurious edges detected at zero lag). The mean density for each subject is calculated as the average density across all epochs for the subject. The mean density for each group (ASD and control) is calculated as the mean of the subject densities within each group. The degree is also calculated in the standard way as the number of edges that connect to each node, and average degree values for a subject and group are calculated in the same way as the average density values.
In addition to correlation networks, we also computed networks with a second measure of linear association - the coherence, estimated using the multi-taper method [
73]. As for the correlation networks, we inferred coherence networks for all derivative pairs over 2 s epochs. To calculate a p-value to identify significant edges in the coherence networks, we first transformed the coherence, C, to the quantity (ν
0 − 1)|C|
2/(1 − |C|
2), which has an approximate F-distribution with two and ν
0 − 2 degrees of freedom under the null hypothesis of no coherence. Here, ν
0 is twice the number of tapers, either 10 or 16. We then corrected for multiple significance tests using a linear step-up FDR controlling procedure with
q = 0.05. Coherence networks were computed for four electrode montages - double banana, transverse, Hjorth Laplacian, and neck reference – and for both sleep and wake data, at 4 frequencies with 5 Hz bandwidth and 8 tapers (centers at 3.5 Hz, 8.5 Hz, 13.5 Hz, and 18.5 Hz) and 8 frequencies with 3 Hz bandwidth and 5 tapers (centers at 2.5 Hz, 5.5 Hz, 8.5 Hz, 11.5 Hz, 14.5 Hz, 17.5 Hz, 20.5 Hz, and 23.5 Hz). However, we found no significant differences in density between the ASD and control groups in the exploratory analysis, and the analysis of coherence networks was not continued in the validation dataset.
Bootstrap test for significantly different edges
With the aim of developing a biomarker for ASD, we sought to assess the difference in network structure between the ASD and control groups. While a network-wide measurement such as the density is informative, a measure that localizes differences between ASD and control networks to more specific connections (e.g., network edges) would provide additional information. Knowledge of specific edge differences would allow us to focus on just these edges, reducing noise introduced by non-informative edges, and potentially producing a more sensitive and specific biomarker.
To that end, a bootstrap analysis was performed to test whether a significant difference occurs between the ASD and control groups in the appearance of each edge. We began with the null hypothesis that no difference exists between the two populations. We then created surrogate data for each subject by randomly drawing with replacement functional networks (each derived from a 2 s epoch) from the combination of all ASD and control subjects. This process of generating surrogate data was then repeated for all subjects. In this way, the surrogate data for each subject of each group was created. If the null hypothesis is correct, we should find no statistically significant differences between the network features deduced from the original ASD and control groups compared to the surrogate data.
We repeated this process of generating surrogate data and computing network measures 100,000 times to create a distribution of average edge weights for each edge in the ASD group and in the control group. For each of the 100,000 surrogates of both groups, 153 average edge weights were calculated (one for each node pair). We note that the average edge weights were calculated in the same way as for the original data; that is, for each subject we computed an average network across the 2 s epochs, and then these subject networks were averaged to produce a population average network for the ASD group, and a population average network for the control group. The 100,000 surrogates correspond to 100,000 population average networks for the ASD group, and 100,000 population average networks for the control group. In these surrogate data, the 100,000 values for each edge weight establish the bootstrap distributions of the edge weights for the ASD group and control group.
We then compared each observed average ASD edge weight to the corresponding surrogate ASD distribution, and each observed average control edge weight to the corresponding surrogate control distribution. This bootstrapping allows us to examine each edge individually, and to determine the statistical significance of particular edges in the ASD and control groups. Finally, we determined the subset of edges identified as the most significantly different in the observed data compared to the surrogate data. In practice, these edges were associated with the smallest p-values detectable in the bootstrap procedure (
p < 10
−5). The edges identified in this way were used to generate a “mask”, or selection of edges most significantly different from the bootstrap distribution, with the purpose of developing a biomarker of ASD (Fig.
1).