Background
Health literacy (HL) represents a complex intersection of skills needed to “obtain, process, understand, and communicate about health-related information needed to make informed health decisions.” [
1‐
3] The 2003 National Assessment of Adult Literacy (NAAL), a representative survey of 19,000 adults in the United States (US), found that approximately half of all adults demonstrate HL related difficulties, and over one-third (36%) have basic or below basic HL [
4]. Compared to individuals with higher HL, those with limited HL are found to use fewer preventive services (e.g., cancer screening) and are more likely to engage in unhealthy behaviors (e.g., poor medication adherence), resulting in increased risk for hospitalization and diminished health outcomes [
5,
6]. Furthermore, estimates suggest that low HL costs the US economy between $106–$238 billion annually, and accounts for between 7%–17% of personal healthcare expenses [
7]. Due to the common occurrence of limited HL, and its corresponding social and economic impact on population health, it is a top public health priority [
5,
8]. With a recent shift in healthcare practice to prioritize patient involvement in medical decision making, measuring HL in order to evaluate patient abilities, develop patient-centered interventions, and promote patient empowerment in the healthcare setting continues to gain support [
8,
9]. Given the importance of HL, our aim was to look at the performance of the most commonly used HL measure, The Short Test of Functional Health Literacy in Adults (S-TOFHLA) [
9‐
11].
HL measures are useful for evaluating and classifying patient abilities so that information can be presented in a way to meet patients’ skills and needs. Yet, existing measures of HL may lack the specificity to accurately assess patients’ ability to comprehend numeric information, providing a limited view of patients’ abilities [
8,
9,
12‐
14]. The comprehensive measurement of HL is challenging within clinical settings because HL includes multiple elements, such as print literacy, speaking and listening (oral and aural literacy), cultural knowledge, social skills, and numeracy [
2,
9,
13,
15,
16]. Numeracy, defined as one’s aptitude with probabilities, fractions and ratios [
16,
17], is of primary interest among those focused on developing risk communication strategies to promote patient engagement in healthcare decisions [
14,
18]. Risk estimates and numerical information designed to depict probabilities, percentages, frequencies and trade-offs are widely used in patient decision support materials such as decision aids, but are often poorly understood even among those with higher HL [
8,
9,
13,
18‐
20]. Objective numeracy measures provide insight into individuals’ ability to understand numerical and quantitative information; yet, individuals may be reluctant to objective test questions (e.g. math test questions, probability test questions) and more amenable to subjective measures (e.g., self-reported comfort with numbers, preference for numerical information), without compromising clinical utility [
12,
17,
21]. While there is general consensus about the importance of evaluating HL and its associated dimensions, there is no agreed upon “gold-standard” measure, and there is limited agreement about which dimensions of HL can be measured while maintaining clinical feasibility [
9,
22]. Moreover, over half of commonly used measures of health literacy have limited psychometric properties and often lack reporting on critical types of validity (e.g., content, construct, criterion, internal, predictive) [
9,
12]. As a result, acceptable strategies are needed that address the limitations of existing HL measures, particularly in the numeracy related dimensions [
9].
The S-TOFHLA is the most frequently used measure of HL, used in over half of all published papers measuring HL [
9,
11]. However, it measures reading fluency, leaving out key domains in HL [
8‐
10,
23], and is often not feasible to use in clinical settings due to limited time and resources for administering and scoring the measure [
24]. Prior research has questioned established S-TOFHLA scoring and categories [
9,
25‐
31]. Thus, the purpose of this study was to look at the performance of the S-TOFHLA in identifying those with limited numerical HL when compared to a subjective and an objective numerical HL measure.
Discussion
This study raises concerns about the 36-item S-TOFHLA, a commonly used measure that has been used to identify individuals with low HL, in identifying individuals with limited numeracy. Results indicated that a large proportion of participants whose scores characterize them with “adequate” HL based on the S-TOFHLA scored low on measures of individuals’ ability to understand and interpret quantitative information.
Our results suggest that individuals categorized as having low HL on quantitative HL measures will be misclassified as having “adequate” HL with the S-TOFHLA. This is critical, as individuals with HL difficulties are at-risk for slipping through the cracks and may not receive the numerical support they need if they are screened with the S-TOFHLA. The S-TOFHLA only assesses limited aspects of HL, and yet, it persists as the most commonly-used HL measure in both research and clinical contexts [
9,
11]. The current results support previous findings that participants are over-classified with “adequate” HL on the S-TOFHLA when compared to other HL measures [
9,
11,
25‐
30,
50]. Moreover, our findings build on existing literature by adding evidence for the notable numeracy and graphical deficits of the widely used 36-item S-TOFHLA, challenging the utility of the S-TOFHLA and its use as a general HL measure.
The SNS and GL Total and Subscale score findings highlight the deficits of the S-TOFHLA for assessing basic and advanced numeracy skills, such as understanding risk, probabilities, percentages, and frequencies. While the 36-item S-TOFHLA was not designed specifically to assess numeracy, it is being used to assess general HL, of which numeracy is a critical component. Additionally, objective and subjective measures may capture different skills associated with HL and using both types of questions may be needed to reduce participant burden without compromising clinical utility. Correlations between the S-TOFHLA and the SNS were small to moderate, while correlations between the S-TOFHLA and the GL scales were moderate to large. The latter correlations between the S-TOFHLA and the GL scales may be due to both being objective measures. Despite these associations, the S-TOFHLA still misclassified many individuals based on the numeracy scores. Our findings question the broad acceptance and use of the S-TOFHLA as a universal measure of HL. A more systematic approach that provides supports for those who have deficits in HL may be a better intervention strategy rather than over-relying on limited, individual HL measures.51–54
Our findings add to the understanding of challenges associated with HL measurement. In order to make informed choices, patients must understand the likelihood of achieving a benefit or a harm from a treatment. Approaches that identify those with limited HL and numeracy are needed to ensure that patients receive support (if needed) to engage in these types of decisions. HL is a complex construct, and existing literature identifies and describes multiple ways of measuring it. For example, Duell et al. identified three levels for HL measurement: basic, communicative/interactive, and critical HL [
9]. These levels are similar to the three GL Levels: reading the data, reading between the data, and reading beyond the data [
37,
42]. In the current study, over half of those who scored “adequate” on the S-TOFHLA scored low on the GL1 subscale (reading the data). Additionally, about two-thirds of those who scored “adequate” on the S-TOFHLA scored low on the GL3 subscale (reading beyond the data). This can be observed in the correlations between the S-TOFHLA score and the GL subscales scores decrease as the GL level increases, suggesting that the S-TOFHLA may not adequately capture these more advanced level numeracy skills. For promoting patient involvement in medical decision making, numeracy is a primary skill needed to understand risk, probabilities, percentages, frequencies and trade-offs [
14,
18]. Results highlight how those scoring “adequate” on the S-TOFHLA lack not only the advanced skills, but the basic HL skills needed to function in healthcare settings which may inhibit patient engagement in medical decision making.
There are various approaches to help address the challenges associated with measuring HL. First, the assumption that a single HL measure is adequate may not be the case. The HL measures included in our investigation show the need for capturing the complex skills that make-up HL. While previous studies have provided evidence to push back against commonly-used measures, such as the S-TOFHLA and REALM, a continued effort to challenge the expected use of one of these tools as a way to definitively identify those with low HL is needed [
11,
50]. Furthermore, simply challenging the existing score cut-points employed by the S-TOFHLA may not be enough to identify those with limited HL skills as our findings show the discordance compared to objective and subjective numerical HL measures using multiple score cut-points. Second, there is a need for the development of a feasible strategy to capture patients’ ability to interpret and apply quantitative information in clinical and research settings. Developing strategies that incorporate subjective and objective factors critical to assessing HL, such as graphical literacy, culture, physiological condition, and relevance to disease type, are to be considered in updated measurement strategies [
23,
51]. Third, is the priority for incorporating HL principles and strategies to support patient-centered care [
51‐
53]. Strategies such as narratives, engaging storytelling and other visual supports may reduce patient burden and promote engagement for those with both high and low HL.
This research study has potential limitations. This study was conducted in a large urban area using a convenience sample. Thus, the sample is diverse and matches the makeup of large urban centers, but rural patients may have not been well represented which may impact generalizability. Second, we used the 36-item S-TOFHLA measure, which does not assess numeracy. Although this measure is broadly accepted [
10], including the additional four numeracy items may have provided more detailed numeracy information. We used cut-points to categorize HL levels, which is consistent with research and clinical use of the S-TOFHLA and enabled comparisons between measures. Optimal score cut-points did not exist for the SNS and GL. To address this limitation, we used a median split approach, and more generous score cut-points of the 25th and 36th percentiles. With this strategy, we were able to present different measure score cut-points and compare them to the S-TOFHLA categories.