Background
Osteoporosis is characterized by low bone mass and deterioration of bone tissue [
1,
2]. The burden of this disease on individuals and the healthcare system is typically a result of fragility fractures that may result in immobility and hospitalization [
3]. In 2010, it was estimated that 30% to 50% of women and 15% to 30% of men will suffer an osteoporotic fracture in their lifetime [
4]. Osteoporotic fractures are more common than heart attack, stroke and breast cancer combined and hip fractures caused by this disease utilize more hospital bed days than diabetes, stroke, or heart attack [
5]. As of 2010, the yearly cost to the Canadian healthcare system for treating an osteoporotic fracture was over 2.3 billion Canadian dollars [
5]. Thus, fracture prevention strategies are key to reducing this burden. Exercise and physical activity is essential to preserve bone and physical function in patients with osteoporosis. A growing body of literature focuses on factors that affect exercise adherence including the facilitators and barriers to an exercise program.
Exercise as a means to prevent bone mineral density (BMD) loss has been explored extensively in the literature over the past two decades [
2,
4,
6]. Exercise and physical activity are increasingly being recognized as a means to reduce the risk of osteoporotic fractures [
2,
7] by increasing muscle mass and maintaining or increasing BMD [
7‐
9]. Although the terms “exercise” and “physical activity” have distinct definitions, they are often used interchangeably in the literature. Physical activity is defined as “any bodily movement produced by skeletal muscles that result in energy expenditure” while exercise is “any form of physical activity that is planned, structured, repetitive and purposive” and used to maintain or improve physical endurance [
10]. Since any form of activity is seen as beneficial to this population, this paper will not distinguish them.
A systematic review in 2013 indicated high variability in adherence to physical activity guidelines, with 2.4% to 83% of older adults meeting the recommendations [
11]. This variation may indicate that a substantial proportion of people experience major barriers to exercise. In order to further outline the facilitators and barriers to exercise pertinent to patients with osteoporosis, we developed the Personalized Exercise Questionnaire (PEQ), to assess outcomes that were considered important by a panel of patients with low bone mass, physicians, therapists, and researchers. A comparable instrument that measures exercise beliefs exists; however this alone would not be sufficient to identify participant’s needs. The Exercise Benefits/Barriers Scale (EBBS) developed in 1987, has 43 questions and uses a 4-point Likert scale:
strongly agree, agree, disagree, strongly disagree [
12] and has a greater focus on attitudes and beliefs about exercise since the majority of items examine levels of knowledge about specific health benefits of physical activity. A study published in the British Geriatric Society Journal randomly selected 409 older adults determined that almost all participants (95%) believed physical activity was beneficial but barriers such as lack of interest, lack of transportation, pain, disliking going out alone, etc. were deterrents toward exercise adherence [
13]. These barriers not covered in the EBBS may be more important determinants of exercise adherence. The EBBS also has minimal focus on the specific type of exercises that would be preferred and thus may not directly inform proper exercise prescription. Therefore, the PEQ was designed to address a different conceptual domain than the EEBS. The purpose of the PEQ is to collect information about self-reported facilitators, barriers, and preferences to exercise with the goal of supporting a better understanding of exercise adherence and patient centre exercise prescription for people with osteoporosis.
Methods
The PEQ followed the two-step method described by Stein et al. [
14] and Armstrong et al. [
15], one involving instrument design and the other obtaining judgmental evidence. Instrument design was performed in a three-step procedure: A) content and domain specification; B) item generation; and C) instrument construction [
16]. The second step, judgmental evidence (content validity) was conducted with a panel of experts [
16].
Step one: Instrument design
Content and Domain Specification & Item Generation
Items were generated from the literature retrieved from a PubMed and a CINHAL search to identify publications that evaluated exercise and/or physical activity in the osteoporosis population. Items were generated from a systematic review that evaluated the facilitators and barriers to exercise in patients with osteopenia or osteoporosis and a Belgian focus group study in older adults with osteoporosis that identified motivators and barriers to exercise [
17,
18]. Due to limited research regarding specific motivators and barriers in the osteoporosis population, further items were attained from other populations: A) one from a Canadian focus group study that considered the facilitators and barriers to exercise in women aged 55–70 years [
19], B) another from a study that evaluated exercise adherence in middle-aged adults [
20], C) the third, from literature that evaluated the facilitators and barriers to exercise in community-dwelling adults [
21], and D) the fourth, a literature review that identified barriers and facilitators to exercise in people with hip or knee osteoarthritis (OA) [
22]. Women and older adults were chosen as similar populations to the osteoporosis group since this disease is more prevalent among women and is often diagnosed in older adults [
5]. After extracting items and identifying duplicates, 37 unique questions were identified from the literature to construct a preliminary version of the PEQ.
Domains were selected using the Alternative Theory of Planned Behaviour, a combination of the Theory of Planned Behaviour and the Social Cognitive Theory [
23]. This Alternative Theory analyzes four concepts: “perceived behavioural control,” “attitude toward exercising,” “environment,” and “normative beliefs” [
23]. The 37 items were then categorized under one of the four theory concepts. Theory was originally used to create four domains for this tool; however, throughout the iterative development process, the titles were changed to reflect more patient friendly terminologies. For example “
normative beliefs” was changed to “
support network” and “
environment” to “
access”. Additional sections were added based on items found in the literature and some concepts from the Alternative Theory were combined to create new domains. Five domains were identified in the preliminary version: 1) my support network; 2) my access to exercise; 3) exercise goals; 4) my exercise preferences; and 5) my exercise barriers. For simplicity purposes, the following titles will be used when referring to a specific domain: 1) support network; 2) access; 3) goals; 4) preferences; and 5) barriers.
Instrument construction
Consistent with the recommendations from Stone, the preliminary version of this tool was circulated to an advisory committee for feedback [
24]. A three-member committee was asked to evaluate the overall format, domains, and items of the questionnaire. The committee was comprised of a musculoskeletal researcher with a physiotherapy background, an investigator specialized in osteoporosis and exercise research, and a rheumatologist with a specialty in osteoporosis research. The questionnaire was revised through iterative feedback and submitted to a Delphi expert panel comprising of an osteoporosis researcher with a clinical degree in physiotherapy, two doctorate researchers specialized in osteoporosis research, and a kinesiologist. The Delphi technique, developed by the Rand Corporation, was used to seek convergence on this topic because it allows experts to work independently [
25]. Each domain and item was reviewed for structure and clarity, redundant inquiries eliminated, and ambiguous wording modified.
Step two: Judgmental evidence (content validity)
New surveys must be rigorously tested to ensure a tool is valid [
26,
27]. Validity is defined as the extent to which any instrument measures what it is intended to [
28]. For this reason, the development of the PEQ went through multiple iterations to ensure the survey was clearly worded, well defined, and covered topics important to patients with osteoporosis. Content validity measures how well items correspond or reflect a specific domain and are measured using quantitative techniques [
27,
28]. Cognitive interview methods can explore how patients with osteoporosis might interpret the meaning of survey items [
26]. Cognitive interviews were used to determine the following: 1) if participants understood the item; 2) if they understood the item the way the researcher intended, and 3) how participants calibrated the item and its response options. Lastly, focus groups were used to determine how respondents answer survey questions, identify potential problems that lead to response error, and comment on the overall format of the tool.
Content validity
There are multiple methods for testing content validity. This study used one method that involved empirical techniques to calculate the index of content validity (CVI) and the content validity ratio (CVR) and semi-structure cognitive evaluations [
15,
16]. The empirical techniques reviewed in this tool were:
1)
CVI: CVI is the most widely reported approach for content validity in instrument development and can be computed using the Item-CVI (I-CVI) and the Scale-level-CVI (S-CVI) [
16]. I-CVI is computed as the number of experts giving a rating of “
very relevant” for each item divided by the total number of experts. Values range from 0 to 1 where I-CVI > 0.79, the item is relevant, between 0.70 and 0.79, the item needs revisions, and if the value is below 0.70 the item is eliminated [
16]. Similarly, S-CVI is calculated using the number of items in a tool that have achieved a rating of “
very relevant” [
16]. There are two methods to calculating S-CVI, one is the Universal Agreement (UA) among experts (S-CVI/UA), and the second, the Average CVI (S-CVI/Ave), the latter being a less conservative method [
16]. S-CVI/UA is calculated by adding all items with I-CVI equal to 1 divided by the total number of items, while S-CVI/Ave is calculated by taking the sum of the I-CVIs divided by the total number of items [
16]. A S-CVI/UA ≥ 0.8 and a S-CVI/Ave ≥ 0.9 have excellent content validity [
29].
2)
CVR: The second type of empirical analysis was CVR, which measures the essentiality of an item [
30]. CVR varies between 1 and −1, and a higher score indicates greater agreement among panel members [
16]. The formula for the CVR is CVR = (Ne – N/2)/(N/2), where Ne is the number of panelists indicating an item as “essential” and N is the total number of panelists [
16].
A cover letter and the PEQ were included with the content validity survey explaining why experts were invited to participate, along with clear and concise instructions on how to rate each item. To evaluate whether items were relevant, clear and essential, experts were given a critical appraisal sheet with the following four inquiries: 1) the relevance of each question in the tool (how important the question is); 2) the clarity of each question (how clear the wording is); 3) the essentiality of each question (how necessary the question is); and 4) recommendations for improvement of each question. The critical appraisal tool that experts used to rate the questionnaire is in Additional file
1: Appendix A. For the relevancy scale, a 4-point Likert scale was used and responses include: 1 =
not relevant, 2 =
somewhat relevant, 3 =
quite relevant, and 4 =
very relevant. Ratings of 1 and 2 are considered content invalid while ratings of 3 and 4 are considered content valid [
31]. A 3-point Likert scale was used for the clarity and essentiality scale since answers can only be trichotomous. The clarity scale was: 1 =
not clear, 2
= item needs some revision; and 3 =
very clear, and for essentiality: 1 =
not essential; 2 =
useful, but not essential; and 3 =
essential [
15,
16]
. Additional comments and recommendations by the experts were written on the hard copy of the questionnaire that was provided with the cover letter.
The recommended number of experts to review an instrument varies from 2 to 20 individuals [
15]. At least 5 people are suggested to review the instrument to have sufficient control over chance agreement [
16]. Content validity was determined using a number of experts (
n = 6) that included an athletic therapist and a Ph.D. candidate from the University of Western Ontario, a physiotherapist, a chiropractor, and a family doctor from Toronto, Ontario and an orthopedic surgeon with a research background in osteoporosis from McMaster University. Experts were chosen based on the following guidelines: 1) worked in a medical or rehabilitation setting with patients with osteoporosis; or 2) published at least one article related to the care of patients with osteoporosis.
Cognitive interviews
Cognitive interviewing is a methodology that examines how respondents comprehend, interpret, and answer survey questions [
26]. The purpose of cognitive interviewing is to obtain information about the process respondents use to answer survey questions, identify potential problems that may lead to survey response error, and gain a better sense of their perception regarding items [
32]. The question-and-answer model has been cited as a useful representation of how respondents answer survey questions [
26]. This model suggests four interdependent elements, “comprehension of the information”, “retrieval from memory”, “decision processes”, and “response selection”, that interact together and predict how respondents make judgments about the level of detail needed to answer survey questions [
33].
Cognitive testing was undertaken specifically with clinicians and patients to evaluate the cognitive process they followed to answer survey questions and to identify items that were not well understood. Techniques used to evaluate clinician and patient understanding of questions were a combination of both the think-aloud and verbal probing [
33]. Together, these approaches were used to determine how well survey items were understood and how well different response options were reached. Specific think-aloud questions were: “please tell me what you are thinking as you answer this question” or “what steps are going through your head as you pick an option for this question” [
32,
33]. Verbal probes were scripted or spontaneous and scripted questions included, “what do you think the question is asking you” and “please think aloud and tell me how you would answer this question” [
26,
32,
33].
Cognitive interviews were done at McMaster University with 4 Ph.D. graduate students from the Department of Rehabilitation Science who had clinical backgrounds in occupational therapy, kinesiology, and physiotherapy. Interviews were also conducted with 2 patients from Hamilton, Ontario and 9 patients from London, Ontario and all patients had a diagnosis of osteopenia or osteoporosis. All interviews lasted between 1 to 1.5 h and were recorded and notes were taken. Analytic memos were created based on digital recordings and notes. Memos were coded into the following categories: 1) no problem with the item; 2) minor misunderstanding with the item; and 3) item unclear. Items marked “minor misunderstanding” were reworded, while those marked “unclear” were eliminated, reworded or integrated with another question.
Focus groups
Focus groups are “informal discussions among selected individuals about a specific topic” [
34] and can be used to follow up on issues revealed during cognitive interviews or used as a standalone protocol to generate ideas through group discussion [
32]. Focus groups are typically more open-ended and less structured than cognitive interviews and can help elicit a greater range of responses. In this paper, focus groups were used to elicit respondents’ understanding, opinions, and views within the context of discussion and debate with other [
34].
Two focus groups were held, one at McMaster University and the other at the University of Western Ontario, with 8 and 12 graduate students enrolled in the department of Rehabilitation Science and Physical Therapy, respectively. The majority of students were enrolled in a Ph.D. program. During the focus group, students were given a paper copy of the questionnaire and instructed to read each item and give their thoughts regarding the relevance, clarity and importance of each question. They were also asked to verbalize their thoughts about each item and whether it was in the correct domain. Although a digital recorder was not used, notes were taken during each focus group session.
Discussion
This study developed and provided content validation of the Personalized Exercise Questionnaire (PEQ) that assesses multiple domains relating to the facilitators, barriers, and patient preferences in relation to exercise. Although the questionnaire was developed with the osteoporosis population as a primary target, the majority of items are not specifically related to osteoporosis suggesting that the questionnaire may be useful in a variety of other populations upon further validation.
The PEQ provides a unique self-report tool to assist with assessment of factors that may support or hinder the adoption and maintenance of regular exercise. Since it is well-known that adherence to exercise, physical activity, or home-based therapeutic exercise is problematic, it is the intention that the PEQ might support assessment of facilitators, barriers, and personal preferences in groups of people or be used to develop more personalized exercise recommendations for individuals to ultimately increase exercise participation. An article by Crombie et al. found the levels of knowledge about specific health benefits of exercise were high, yet the majority of older adults did not participate in any physical activity. The authors suggest national campaigns to encourage exercise and physical activity [
13], however, persuading individuals with barriers to exercise may be difficult. Thus strategies to increase activity levels must include identifying the facilitators, barriers and patient preferences to an exercise program and compiling these factors using the PEQ to encourage participation.
The most common method for measuring content validity is calculating the Item-level CVI (I-CVI), however, an alternative, unacknowledged method to measure content validity is Scale-level CVI (S-CVI), which can be calculated using S-CVI/UA or S-CVI/Ave. The two approaches can lead to different values, making it difficult to draw the proper conclusion about content validity [
37]. I-CVI measures the content validity of individual items while the S-CVI calculates the content validity of the overall scale. Most papers report the I-CVI or the S-CVI but not both. This paper considered both the I-CVI and the S-CVI since the S-CVI is an average score that can be skewed by outliers. The number of experts (
n = 6) was considered adequate for content validation as the number of raters ranges from a minimum of 3 to a maximum of 10 [
16,
30]. An I-CVI of 0.78 or higher is considered excellent. The I-CVIs of all items in the PEQ ranged from 0.50 to 1.00 with only four items having an I-CVI less than 0.78. This supports the conclusion that individual items were important and relevant to measuring the facilitators, barrier and patient preferences to an exercise program. The minimum acceptable S-CVI is considered to be any value between 0.80 to 0.90 [
30,
37]. Two values were calculated: S-CVI/UA and S-CVI/Ave. The Universal Agreement approach suggested the overall content validity of the PEQ was moderate (S-CVI/UA = 0.63), while the Average method suggested high content validity (S-CVI/Ave = 0.91). Although the Universal Agreement method only considers items that have an I-CVI of 1.00 and may be considered more comprehensive than the Average approach, this method may be underestimating content validity of the overall questionnaire since the likelihood of achieving 100% agreement in all items decreases when the number of experts increases. The alternative and less constricted method is the S-CVI/Ave approach that may be overestimating content validity since the numerator in the Average technique will always be greater than the numerator of the Universal Agreement approach if I-CVI values are not all equal to 1.00. For this reason both the S-CVI/UA and the S-CVI/Ave were calculated and the true overall content validity of the PEQ may be somewhere in-between.
A less common way to calculate content validity is to use the CVR approach. This method determines how many raters mark an item as essential. Thirty-one items had a positive CVR value indicating at least half the raters considered the items to be essential, with an overall suboptimal content validity score, CVR = 0.53. It is possible that raters did not understand the item since only 14% of questions were considered very clear. Items were marked relevant indicating they were directly related to the topic but due to poor clarity raters may not have clearly understood what the item was measuring resulting in a low CVR score. The next step in instrument develop was to improve the item clarity using qualitative evaluations. Quantitative methods strongly supported individual items in the PEQ, and the use of cognitive interviews and focus groups were used to further refine the clarity of the language.
The complexity of doing numerous rounds of cognitive interviews and focus groups was to decide what information was relevant and when information was no longer considered important in tool development. The goal of cognitive assessments and subsequent revisions was to reach a point where there was sufficient evidence of no problems with item comprehension; at this point saturation has been achieved. Overall, the PEQ benefited from multiple consultations and iterative revision, which contributed to substantial changes in the types of items, concepts, wording, response options, and the overall structure of the questionnaire. Lack of clarity, misinterpretation, and ambiguity of items were the primary reasons for instrument modifications. It became clear that multiple iterations were essential since repeat consultations with people who had previously seen early versions had additional recommendations. There is no clear indication of the optimal number of revisions required to be certain that a measure is well developed [
38]. However, the concept of saturation applies in iterative feedback when no recommendations being made are considered useful or that multiple respondents can agree upon. The PEQ underwent three additional rounds of revisions with a heterogeneous interview sample until feedback was not applicable to the majority. Content validity calculations (CVI and CVR) were not necessary to be measured again in the final version of the PEQ since content validity of individual items was excellent. Furthermore, rigorous qualitative research provided evidence of high content validity of the overall PEQ by reaching saturation through interviews with multiple experts.
Understanding factors affecting exercise adherence measured across multiple domains may help develop targeted interventions that may increase the quality and delivery of physical activity programs. This tool has potential applications in both the research setting and in clinical practice. Investigators can use this tool to survey their population of interest and use this information to inform decision-making about the type, frequency, and location of the exercise for the majority. The goal of designing an exercise program in research is to encourage individuals to continue the program long after the intervention has finished. Identifying an exercise program that increases muscle and bone mass, catered towards patient needs, will be one way of increasing exercise adherence. This tool can also help clinicians identify and design better exercise prescriptions for individual clients. It is important for healthcare providers to identify their patients’ facilitators, barriers, and exercise goals before giving specific recommendations since understanding these factors may result in better and more effective exercise prescriptions.
Limitations
With any preliminary questionnaire there were some limitations to its design. The limitations of this study include: (1) potential lack of generalizability; (2) risk of using a self-reported measure; and (3) length of the questionnaire. Although the PEQ was designed for people with osteoporosis it may be applicable in elderly populations, but its generalizability to other clinical populations is unknown and must be tested. Secondly, with all self-reported measures there is a risk of recall bias or inflated answers to reflect lower impediments to exercise. The questionnaire also takes about 20 to 30 min to complete.
Acknowledgments
I would like to thank my supervisor, Dr. Joy MacDermid, for all her dedication and guidance during this Master’s thesis. Dr. MacDermid allowed me to take charge of this project and grow as a researcher and as an individual, while providing advice along the way. I would also like to thank my supervisory committee members, Dr. Jonathan Adachi and Dr. Karen Beattie, for their knowledge and insight throughout this research project. In addition, I am appreciative to Drs. Adachi and MacDermid for allowing me to attend their clinic to recruit patients for cognitive interviews. Without their support and help this project would not be possible. I would also like to thank the study participants and Dr. Naomi Fink for their time and contribution in providing their input toward this questionnaire and Margaret Lomotan at McMaster University for her continuous operational support.