Background
Active involvement of patients in decision making for their own medical care, such as deciding whether or not to accept a particular treatment (informed consent), or choosing among medical-care options (informed choice), is becoming a worldwide practice [
1]. In Japan, informed consent was codified in the “Medical Service Act” in 1997, and is now common practice [
2]. Recent surveys show that moving beyond informed consent, Japanese patients are involved in informed decision making in clinical practice [
3] and that they prefer this involvement [
4]. Consequently, it is very important to ensure that patients correctly understand medical information, so that their decisions, possibly made in life-threatening situations, reflect their true will.
One way to address this issue would be to assess patients’ health numeracy, the ability to understand probabilistic and mathematical concepts [
5]. Health numeracy has gained increasing attention, as the amount of quantitative information, such as the probability of survival outcomes for different medical treatments, is increasingly present in medical risk information [
6,
7]. Previous studies assessing health numeracy have shown that individuals with low numeracy are more likely to misunderstand risk information, and their risk evaluations tend to be more influenced by context, such as how the related numbers are framed (reviewed in [
5,
8‐
11]). While many of those studies used healthy respondents, studies with actual patients have shown the influence of numeracy on their disease-related decision making (e.g. [
12‐
15]). Therefore, to ensure accurate medical-risk communication, it is important to know whether patients have a sufficient level of health numeracy.
Despite its importance, however, health numeracy and assessment scales have been largely understudied in many countries, including Japan. In fact, previous research is mostly from the United States and Europe, with research in other countries just beginning (e.g. [
16‐
18]). These pioneering studies on cross-cultural comparisons have shown considerable differences in numeracy levels [
16,
19], as well as the association of numeracy with decision making [
20,
21] across different countries. Therefore, it is important to determine how previous findings apply to different countries, and develop strategies suitable for each case.
In Japan, the limited attention paid to health numeracy might be a reflection of the common belief that the majority of Japanese people enjoy basic numeracy. For example, since its inception in 2000, the Programme for International Student Assessment has rated the mathematical ability of Japanese students (aged 15 to 16 years) as higher than the international average, whereas that in the United States has been below average. While this might imply that Japanese patients are able to correctly use numerical medical data, recent pioneering work in Japan suggests this might not be the case [
22]. In their study, Takahashi et al. developed a 7-item scale to test the ability of Japanese patients to interpret medical information (TAIMI), with 3 of these items especially relevant to numeracy. Unexpectedly, more than 50 percent of respondents made mistakes in 2 of these items. Those 2 items evaluated the effect of a medicine, one presenting information as a fraction and the other as a natural frequency. This result suggests that Japanese health numeracy may not be as high as expected, and that there is a need for further investigation specifically focusing on health numeracy.
The aim of this study is twofold: 1) to evaluate Japanese versions of numeracy scales, and 2) to assess the health numeracy of Japanese adults.
We chose two well-known health numeracy scales that focus on the basic understanding of math and probability, the 3-item Schwartz scale [
23], and its expanded version, the 11-item Lipkus scale [
24]. To date, these scales are among the most frequently used instruments in numeracy studies. Since the focus of these tests is different from that of TAIMI, we were interested in how Japanese people performed in these health numeracy scales. Japanese versions of these scales (Schwartz-J and Lipkus-J) were prepared using forward and backward translation procedures [
25].
The reliability of the scales was assessed for internal consistency. The original scales were shown to be unidimensional; the original Schwartz study did not conduct a factor analysis, but Lipkus and colleagues evaluated the factor structure of their scale which included the Schwartz items [
24]. However, there are also studies showing multi-factor structure on these scales [
18]. Since factor structure could be different depending on the nature of a target population [
26], an exploratory factor analysis was conducted to explore factor structure for the Japanese version. Convergent validity of numeracy scales was evaluated by their correlation with existing measures of health numeracy [
8]. As mentioned above, TAIMI has a two-factor structure where three items are specifically relevant to numeracy [
22]. Thus, we examined correlations between TAIMI scores and the Schwartz-J and Lipkus-J scores. We expected a positive correlation between performance on TAIMI and the scales translated in this study, specifically for the numeracy items of TAIMI (TAIMI-num).
To determine whether numeracy levels measured by these scales have any influence on medical risk communication in Japan, we examined the association between framing bias and performance on numeracy scales. Previous studies found that those with low scores on the Lipkus scale are more susceptible to framing effects [
12,
19,
27,
28], an effect whereby different phrasing influences participant decisions based on mathematically identical data. We tested whether this can be seen with Japanese samples using the Schwartz-J and Lipkus-J scales. To approximate the Japanese population, we used quota sampling (n = 300) according to age, gender, and education level. In so doing, this study explores a method for assessing health numeracy in the Japanese population, with the aim of improving medical-risk communication.
Discussion
In this study, we evaluated Japanese numeracy by translating and applying the Schwartz and Lipkus scales, the widely used health numeracy scales that focus on the understanding of basic math and probability. Translated versions of both scales showed certain reliability and validity, however, the Japanese sample’s high performance caused the score distributions to be negatively skewed, imposing limitations on the psychometric evaluations of the scales. In this section, we first discuss Japanese numeracy in light of our results. Then we address the validity and limits of Lipkus-J and Schwartz-J, and future directions for the application of health numeracy measures.
The current study suggests that basic understanding of math and probability is quite high among Japanese: correct response rates for Lipkus-J items were much higher than those found in the original US samples [
24], and in more recent studies on probabilistic national German and US samples [
33]. This is consistent with the results of the Programme for International Student Assessment (PISA), where the national average math score for Japanese students has been surpassing those of both the US and Germany since its inception [
34‐
37]. The relatively high attainment of math skills during school education, as assessed by PISA, might partly account for the generally high numeracy of the Japanese. The current result is also in line with the recent study assessing the numeracy skills of students at top universities in 15 counties [
16]. Although linking top-university level performance with that of the general population is not straightforward, Japan was second best in having the smallest proportion of respondents falling into the lowest quartile.
However, in spite of generally high numeracy, the performance of Japanese sample on the Schwartz-J and Lipkus-J tests still accounted for susceptibility to the framing effect, which can influence patients’ decisions regarding their medical options, such as acceptance of surgery (e.g. [
12,
19,
38]; however, empirical results on framing effects in a clinical setting are mixed, reviewed in [
39]). A number of previous studies using the original Schwartz and Lipkus scales have shown a numeracy effect on understanding and decision making based on medical information (reviewed in [
5,
8‐
11]). Moreover, studies have been advancing for communicating quantitative risk information with consideration of patients’ numeracy, such as supplementing numerical data with visual or verbal aides, using natural frequencies rather than probabilities, or presenting risks with both negative and positive frames (reviewed in [
10,
11,
40‐
43]). Considering our results and these earlier findings, such care would be called for when communicating medical information to those with low numeracy in Japan, and possibly in other countries where general math performance is deemed to be high.
Regarding instruments to identify those with low numeracy, both the Schwartz-J and Lipkus-J scales demonstrated certain reliability and validity, with Cronbach’s α being comparable with those of original scales, convergent validity being supported by their positive correlation with other health literacy and numeracy measure (TAIMI, [
22]), criterion validity being suggested by their association with the susceptibility to framing bias, and content validity being ensured in the original scales. However, we also found a pronounced ceiling effect, which confounded the analysis we have applied, and limited the psychometric qualities of the scales.
Ceiling effects pose multiple psychometric limitations [
16]. First, they suggest that scales are less able to differentiate among those with high numeracy. Second, statistical methods applicable for data analysis become limited, as many popular methods assume a normal distribution, and possibly giving in erroneous results when this assumption is violated [
44‐
46]. Non-parametric alternatives are not always sufficient. For example, in the current study, we had to use a median split, making it difficult to examine the relationship between numeracy scores and framing effects in depth.
A third limitation is that the means to evaluate the validity of the scale. For example, respondents’ performances can be confounded with other factors such as motivation [
47]. However, ensuring discriminant validly is not straightforward with data having a ceiling effect, as, for example, a weak correlation between motivation and numeracy scores might be due to the ceiling effect, rather than the variables being truly unrelated. Similarly, examining the relationships between measured ability with other closely related abilities such as working memory [
48] would be confounded by the ceiling effect. Thus, use of Lipkus-J and Schwartz-J with high numeracy sample requires careful consideration of those limitations.
In fact, negative skew for the original Schwartz and Lipkus scales have been noted in a number of earlier studies [
28‐
33], and the limitations mentioned above have been pointed out [
16,
27]. In response to those concerns, new numeracy scales have recently been developed: the Berlin Numeracy Test (BNT, [
16]) and the Abbreviated Numeracy Scale (ANS, [
27]). While both scales were built on the works of Schwartz and Lipkus, they have a wider range of difficulty. As a result, they have better psychometric characteristics, especially when used with high-performance samples. Considering the generally high numeracy of Japanese, those new scales might be more suitable for assessing numeracy in Japan, and this should be explored.
Meanwhile, Lipkus-J and Schwartz-J could be useful for assessing those having low numeracy. In the above-mentioned studies that developed new numeracy scales, the effectiveness of the original Lipkus and Schwartz scales is indicated for assessing groups with low numeracy [
16,
27]. In fact, positive skew was observed in some of the samples studied using BNT [
16], where easier tests would work better. This is an important point to consider when clinical applications are in scope, because some patients are likely to be under physical and psychological stress, which might result in lower numeracy. For instance, a recent clinical study using the Lipkus scale found the numeracy of epilepsy patients to be significantly lower than healthy controls even though educational attainments were lower in the control group [
12]. This issue also bears on the test’s validity; where the psychometric characteristics of scales could differ across population groups or settings [
49]. Considering possible difference between patients and healthy groups, the use of the numeracy scales translated here, as well as the above-mentioned new scales, should be explored using patient samples so that more effective numeracy measures for the patient population can be discovered.
Finally, the possible influence of volunteer bias [
50] should be noted when interpreting the current results. Although demographics of the sample matched those of the Japanese adult population, the test respondents were those who voluntarily agreed to participate in a survey concerning numbers. Therefore, the results could be biased towards those who are more interested in solving numerical problems, and not actually representative of the population. In fact, the average total score of TAIMI in the current study was 4.7, which is higher than that of 3.9 in the original report (Internet survey, n = 6047, [
22]). This disparity might be due to differences in sample composition between Takahashi et al.’s work and ours (there were more females and elderly in their study, and no education levels were reported). However, it is also possible that the numeracy reported here is higher than average. This issue should be addressed in future random-selection population-based surveys.
Acknowledgements
We thank Drs. Y. Takahashi and T. Shinbo, the developers of TAIMI for their help in using TAIMI. We also thank Profs. H. Kanuka and S. Kawazu for their support in conducting the research, Mr. Kitazawa, Messrs. Sugimoto, Noguchi and Yamauchi for their assistance in data preparation, and ELCS for proofreading the manuscript. This work was supported in part by a grant from the Global COE Program from Japanese Ministry of Education, Science, Sports, Culture and Technology (MEXT), Programme for Promotion of Basic and Applied Researches for Innovations in Bio-oriented Industry (MO), Grant-in-Aid for Young Scientists (B) 20700779 (MO) and 23700921 (YK) from MEXT, Grant-in-Aid (B) 23300247 from MEXT, and grants from the Japan Science and Technology Agency, under the Strategic Promotion of Innovative Research and Development Program, and Comprehensive Research on Disability, Health and Welfare from Health and Labour Sciences Research Grants (ID). None of the funding bodies had any role in the study design, collection, analysis and interpretation of the data, writing of the paper, or in the decision to submit the manuscript for publication.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
MO designed the study, collected, analyzed, and interpreted the data, and wrote the manuscript. YK designed the statistical analysis procedures, contributed to the study design, supported collection, analysis and interpretation of the data, and helped with the manuscript development. MS and EW contributed to the questionnaire development and data collection. LC contributed to questionnaire development, data analysis, and manuscript development. ID contributed to design the study, collection, analysis, and interpretation of the data, and manuscript development. KK contributed to questionnaire development, data collection and study coordination. All authors have read and approved the final manuscript.