Test–retest reliability
A measure with sufficient test–retest reliability ensures that users can obtain reproducible scores. Good test–retest reliability was found for the repeated assessments of the TONI-4. Moreover, the test–retest reliabilities were similar across the gender and age sub-groups. Accordingly, the TONI-4 has generally good test–retest reliability, which may not be affected by examinees’ gender and age, and it can be used in repeated assessments. In comparison with previous studies, the test–retest reliability of our study was slightly lower than those found for healthy controls (
r = 0.82–0.93) [
25] and was consistent with those of other cognitive assessments examining patients with schizophrenia [
44,
45]. There are three possible reasons for the slightly lower ICC of the TONI-4. First, the test–retest reliability was estimated by Pearson correlation coefficients in the previous studies, which tends to overestimate reliability [
34]. Second, alternate forms (i.e., Forms A and B) were used in this study, which may have resulted in more variation compared to using the same form as previous studies [
25]. Third, the heterogeneity of our sample appeared limited. In particular, the variances of the TONI-4 in this study (SDs = 9.1 and 9.9) were smaller than those of a previous study (SDs = 13–15) [
25], which may have underestimated the ICC values in this study [
46]. In summary, our findings indicate that the TONI-4 appears to be reliable for repeatedly assessing fluid intelligence in patients with schizophrenia.
We found that the SEM% was far below our preset criterion. Furthermore, the SEM%s were generally consistent across the gender and age sub-groups. These findings suggest that the TONI-4 has limited random measurement error. Our findings are consistent with those in previous studies examining healthy groups, where the SEM% were 4.0–5.5% [
25]. These findings support that the random measurement error is similar in patients with schizophrenia and in healthy adults. Therefore, the scores of the TONI-4 tend to be stable in patients with schizophrenia.
In addition, MDC can be viewed as the threshold for a statistically significant change for individual patients in clinical and research settings [
47]. Conceptually, a change exceeding the MDC of the first assessment can be interpreted as a real improvement with the corresponding certainty (e.g., 95%). Thus, a fixed MDC value can be used to interpret the change scores for patients with different levels of fluid intelligence. However, we found that the association between the absolute value of the difference of the early and late assessments and the mean score of the early and late assessments (Pearson’s
r = 0.31) was above 0.30, implying the existence of heteroscedasticity [
41]. That is, the absolute difference and the mean of the early and late assessments increased simultaneously. Accordingly, a fixed value of MDC is not appropriate for different levels of fluid intelligence.
In such assessments with heteroscedasticity, the MDC% is more suitable than the MDC for interpreting a true change for a patient [
48]. That is, as seen in this study, the MDC value can be adjusted based on the MDC% and the patient’s early assessment score. Specifically, the MDC% (14.2%) of the TONI-4 can be multiplied by the patient’s early assessment score to achieve an adjusted MDC value. For example, a patient with a score of 92 points at the early assessment requires an improvement of more than 13.1 points (92 × 0.142) to indicate a true change. These adjustments can help clinicians and researchers interpret the score changes on the TONI-4 of an individual patient after intervention and then develop further treatment plans accordingly.
We found that the scores between the early and late assessments had almost no change. In addition, those values were similar across the sub-groups of examinees’ gender and age. These findings indicate that the scores of the TONI-4 do not systematically increase given that the early assessment (or practice) has already been completed. Our findings are consistent with those in a previous study, where the change scores within one-to-two-week intervals were small (effect size = 0.00–0.07) [
25]. The trivial practice effect may have been due to the use of alternate forms (i.e., Forms A and B) [
49,
50]. However, using alternate forms may lead to underestimation of the practice effect as compared to using a single form. In this study, all participants were administered the forms in a fixed order (i.e., Form A first and Form B second). The fixed order design was used because previous findings had indicated that test–retest reliability is not affected by the order effect [
26]. Thus, clinicians could use alternate forms of the TONI-4 in their routine repeated assessments to effectively minimize practice effects.
Convergent validity
We found that the scores of the TONI-4 were moderately correlated with those of the MoCA and significantly correlated with those of the T-SDMT, supporting our hypotheses. Thus, good convergent validity was demonstrated for the TONI-4. Our results support the validity of the TONI-4 for assessing fluid intelligence in patients with schizophrenia.
This study had two merits. First, the sample size (103 participants) was relatively large. A large sample size tends to provide robust estimates, which improves the generalizability of our findings [
51]. Second, we used alternate forms of the TONI-4. Due to this study design, the practice effects of the TONI-4 were well controlled, so its utility in repeated assessments was confirmed.
Study limitations
Two limitations of this study should be noted. First, the study sample was a convenience sample recruited from a psychiatric center in southern Taiwan. In addition, our participants, on average, had slightly impaired fluid intelligence (the mean score of the TONI-4 was 92.4 points at the early assessment). The above sampling limitations might have affected the generalizability of our findings. Second, we used alternate forms to examine the test–retest reliability of the TONI-4. Thus, our results on test–retest reliability might not be generalizable to single-form assessment of the TONI-4. Using alternate forms may lead to underestimation of the test–retest reliability as compared to using a single form.