Eleven participants took part in the study. Six of them belonged to Cat. I (for one novice user, no prior data was available, but she turned out to be a Cat. I user), two further participants belonged to Cat. II and three to Cat. III. All users performed 8 feedback runs, each of them consisting of 100 trials (50 trials of each class). The timing of the trials was as follows: at time 0, the cue was provided in the form of a small arrow over a cross placed in the middle of the screen, one second later, the cross started to move to provide feedback. Its speed was determined by the classification output (similar to Blankertz et al. (
2007,
2008a)). The task of the participant was to use motor imagery to make the cursor move into a previously indicated target direction. The feedback lasted for 3 s and was followed by a short pause. Two different types of motor imagery, chosen out of three possibilities (motor imagery of left hand, right hand or foot) were selected in advance. For seven users, previous data with motor imagery performance was available which revealed which two motor imagery tasks should be used. For the other four participants (three of Cat. III and one novice) no prior information could be used and they were asked to select two out of the three possible motor imagery tasks. Throughout the whole session, all classifiers were based on Linear Discriminant Analysis (LDA). When advisable due to high dimensionality of features, the estimation of the covariance matrix that is needed for LDA was corrected by shrinkage (Ledoit and Wolf
2004; Vidaurre et al.
2009). In order to define the adaptation schemes for LDA we use a specific variant that is introduced here. For LDA the covariance matrices of both classes are assumed to be equal (assumption of linear separability) and it will be denote by
\({\varvec{\Sigma}}\) here. Furthermore we denote the means of the two classes by
\({\varvec{\mu}}_1\) and
\({\varvec{\mu}}_2\) , an arbitrary feature vector by
\(\user2{x}\) and define:
$$ D({\user2{x}}) = \left[b; \user2{w}\right]^{\top} \cdot \left[1; {\user2{x}} \right] $$
(1)
$$ \user2{w} = {\varvec{\Sigma}}^{-1} \cdot ({\varvec{\mu}}_2 - {\varvec{\mu}}_1) $$
(2)
$$ b = - \user2{w} ^{\top} \cdot {\varvec{\mu}}$$
(3)
$$ {\varvec{\mu}} =\frac{{\varvec{\mu}}_1 + {\varvec{\mu}}_2}{2} $$
(4)
where
\(D(\user2{x})\) is the difference in the distance of the feature vector
\(\user2{x}\) to the separating hyperplane, which is described by its normal vector
\(\user2{w}\) and bias b. Note that the covariance matrices and mean values used in this paper are sample covariance matrices and sample means, estimated from the data. In order to simplify the notation and the description of the methods, we will in the following use covariance matrix instead of sample covariance matrix and mean instead of sample mean. Usually, the covariance matrix used in Eq.
2 is the class-average covariance matrix. But it can be shown that using the pooled covariance matrix (which can be estimated without using label information, just by aggregating the features of all classes) yields the same separating hyperplane. In this study we used the pooled covariance matrix in Eq.
2. Similarly, the class-average mean (calculated in Eq.
4) can be replaced by the pooled mean (average over all feature vectors of all classes). This implies that the bias of the separating hyperplane can be estimated (and adapted) in an unsupervised manner (without label information). The restriction of the method is to have an estimate of the prior probabilities of the 2 classes. If LDA is to be used as a classifier, observation
\(\user2{x}\) is classified as class 1, if
\(D(\user2{x})\) is less than 0, and otherwise as class 2. But in the cursor control application we use the classifier output
\(D(\user2{x})\) as real number to determine the speed of the cursor. Finally, we introduce the features and classifiers that have been used in the three levels of the experiment, including three on-line adaptation schemes: the first two are supervised, i.e., they require information about the class label (type of motor imagery task) of the past trial in order to update the classifier. The last method updates the classifier without knowing the task of the past trial (unsupervised adaptation).
Methods for Level 1 (runs 1–3)
The first run started with a pre-trained subject-independent classifier on simple features: band-power in alpha (8–15 Hz) and beta (16–32 Hz) frequency range in three Laplacian channels at C3, Cz, C4. During these runs, the LDA classifier was adapted to the user after each trial. The inverse of the pooled covariance matrix (see Eq.
2) was updated for observation
\(\user2{x}(t)\) using a recursive-least-square algorithm, (see Vidaurre et al.
2006 for more information):
$$ {\varvec{\Sigma}}({\user2{t}}) ^{-1}=\frac{1}{1-UC}\left( {\varvec{\Sigma}}(t-1)^{-1} - \frac{{\user2{v}}(t)\cdot {\user2{v}}^{\top}(t)}{\frac{1-UC} {UC}+{\user2{x}}^{\top}(t)\cdot{\user2{v}}(t)} \right)$$
(5)
where
\({\user2{v}}(t) = {\varvec{\Sigma}}^{-1}(t-1)\cdot {\user2{x}}(t)\) . Note, the term
\({\user2{x}}^{\top}(t)\cdot {\user2{v}}(t)\) is a scalar and no costly matrix inversion is needed. In Eq.
5, UC stands for update coefficient and is a small number between 0 and 1. For the present study, we chose UC = 0.015 based on a simulation using the data of the screening study. To estimate the class-specific adaptive mean
\({\varvec{\mu}}_1(t)\) and
\({\varvec{\mu}}_2(t)\) one can use an exponential moving average:
$$ {\varvec{\mu}}_i(t) = (1-UC)\cdot {\varvec{\mu}}_i(t-1) + UC\cdot {\user2{x}}(t) $$
(6)
where i is the class of
\(\user2{x}(t)\) and UC was chosen to be 0.05. Note that the class-mean estimation is done in a supervised manner.
Methods for Level 2 (runs 4–6)
For the subsequent 3 runs, a classifier was trained on a more complex composed band-power feature. On the data of run 1–3, a subject-specific narrow band was chosen automatically (Blankertz et al.
2008b). For this frequency band, optimized spatial filters have been determined by Common Spatial Pattern (CSP) analysis (Blankertz et al.
2008b). Furthermore, six Laplacian channels have been selected according to their discriminability, which was quantified by a robust variant of the Fisher score (mean replaced by median). The selection of the positions was constraint such that two positions have been selected from each of the areas over left hand, right hand and foot. While CSP filters were static, the position of the Laplacians was reselected based on the Fisher score of the channels. Channel selection and classifier were recalculated after each trial using the last 100 trials. The classifier used here was regularized version of LDC, with automatic shrinkage, to account for the higher dimensionality of the features, as in Vidaurre et al. (
2009). The feature vector was the concatenation of log band-power in the CSP channels and the selected Laplacians channels. The addition of the repeatedly selected Laplacian channels was included in order to provide flexibility with respect to spatial location of modulated brain activity. During these three runs the adaptation to the user was done again in a supervised way.