Study design and data sources
We designed an observational, retrospective case control study using EHRs from the University of Pennsylvania Health System, a large, regional health care system comprised of six hospitals and hundreds of outpatient centers in southeast Pennsylvania. The data were sourced from a data warehouse containing repositories of structured and unstructured EHR data maintained by and accessed through the Data Analytics Center at the Perelman School of Medicine at the University of Pennsylvania.
During the study period there were approximately 1.2 million patients in the database, 2–3% of which had at least one clinical chart note containing the word “yoga” [
14]. Available structured data fields include demographics, clinical encounters, diagnostic and procedure codes, laboratory test results, and prescribed medications. Unstructured data includes clinical chart notes and imaging data.
Yoga annotation
Yoga notes were manually annotated and associated patient encounters were only included if the patient had a regular yoga practice, defined here as at least one yoga session per week, at the time of the clinical encounter (e.g., “goes to yoga class on Saturdays”, “exercise: yoga 3x per week”, “does yoga daily at home”). Yoga patient encounters were excluded if the note indicated yoga was used less frequently than one time per week (e.g., “yoga 2-3x a month”) or if there was no indication of frequency (e.g., “exercise: walking, yoga”). Otherwise, notes were annotated by the number of times a patient reported using yoga each week.
Study population
Because we do not have direct access to the EHR data, we requested a preliminary data set from the Data Analytics Center that was generated by doing a keyword search for “yoga” in the patient chart notes to identify a potential yoga exposed group (see Yoga Annotation subsection), and a yoga non-exposed group; the latter had no mention of “yoga” in their charts and were matched at random to the yoga exposed group on age, sex, and race. A 3:1 non-exposed to exposed ratio was used to capture unmeasured and unknown confounders [
15]. This preliminary data set is the basis for this study.
Covariates were chosen to represent biological, behavioral, environmental, and social factors that may affect blood pressure or use of yoga including: age, sex, race, ethnicity, BMI, insurance status, zip code at home address, alcohol use disorder, smoking status, comorbidities common to hypertension, and prescriptions for blood pressure lowering drugs.
Comorbidities are the chronic condition dyads that included hypertension from the Centers for Medicaid and Medicare Services. The list includes: diabetes, hyperlipidemia, chronic kidney disease, heart failure, heart failure and chronic kidney disease, and coronary artery disease A list of ICD codes can be found in Additional File
1. Blood pressure lowering drugs include: antihypertensives, beta blockers, calcium channel blockers, and diuretics. Renin-angiotensin inhibitors were categorized as antihypertensives. A list of prescribed drugs by class can be found in Additional File
2.
Missing data were imputed only within a patient’s own records. Height was imputed as the median across all encounters. Weight was imputed as weight recorded at the nearest encounter within 365 days. (Height and weight were used to calculate the BMI as the weight in kilograms divided by the height in meters squared.) Race and ethnicity were imputed as multiracial or Hispanic if more than one race or ethnicity was reported across the entire record. Insurance status and zip code were imputed as the values recorded at the nearest encounter. Comorbidities, smoking status, alcohol misuse disorder, and blood pressure lowering drugs were assigned to an encounter when documented within the prior 12 months.
Inclusion criteria at the patient level required 3 years of clinical history. There were no set exclusion criteria at the patient level, but all encounters for one patient were excluded because they had an equal number of visits as male and female.
Inclusion criteria at the encounter level required encounters to be with primary care providers and that the providers had seen patients in both the yoga exposed and yoga non-exposed groups, the recorded age of the patient had to be between 18 and 79 years for the encounter to be retained, and blood pressure had to have been recorded at the encounter.
Exclusion criteria at the encounter level were applied after imputation and included: a diagnostic code for pregnancy or end-stage renal disease, weight below the first percentile or above the 99th percentile (< 101 lb. or > 322 lb), missing BMI, or systolic blood pressures greater than 220 mmHg or less than 60 mmHg and diastolic blood pressures greater than 140 mmHg or less than 40 mmHg (ranges indicative of a hypertensive or hypotensive crisis).
All data used in this study were collected during routine care at outpatient visits between November 15, 2006 – November 16, 2016.
Data set balancing
We used the coarsened exact matching (CEM) package in R to balance the data set by the measured covariates (listed above) [
16]. CEM stratifies the data set based on categorical or coarsened values of the covariates, then removes any strata that do not contain at least one case and one control. This ensures there is at least one near match for each observation. Age was coarsened to three predefined age groups: 18–39 years, 40–59 years, and 60–79 years, consistent with the most recent National Center for Health Statistics hypertension report [
17]. BMI (kg/m
2) was coarsened to five categories: underweight (< 18), normal weight (18 < = BMI < 24.9), overweight (24.9 < = BMI < 29.9), obese (29.9 < = BMI < 34.9), and severely obese (> = 34.9), and the remaining covariates were already categorical. CEM generates a weight for each observation as a normalized ratio of cases to controls within each strata, these weights were used in all statistical models. In the context of this study, the yoga exposed are “cases”, and yoga non-exposed are “controls”.
Statistical analysis
To calculate the effect of yoga on blood pressure, we fit linear mixed effects models with systolic or diastolic blood pressure as the dependent variable. The statistical units in this study are the encounters. Patients may be represented in the data set more than once; each encounter with the health care system that met the inclusion criteria was retained. An encounter is considered a yoga exposure if the clinical chart notes indicate yoga was being used at least one time per week at the time of the encounter (see Yoga Annotation). We included in the model as covariates: age, sex, race, ethnicity, BMI, frequency of yoga practice, insurance status, zip code at home address, diagnostic codes retained after matching (i.e., smoking status, coronary artery disease, diabetes, and hyperlipidemia), and prescriptions for antihypertensives, beta blockers, calcium channel blockers, and diuretics. A random effect term was included for the patient identifiers to create a shared random intercept across all encounters for each individual patient. Each observation was weighted based on the CEM output. To test for age dependent effects, we repeated the analysis within three age groups, 18–39 years, 40–59 years, and 60–79 years. Beta coefficients are presented as effect sizes and statistical significance was set at p < 0.05.
To quantify the association between regular use of yoga and four blood pressure categories, defined in the 2017 ACA/AHA blood pressure guidelines, we fit mixed effects logistic regression models with blood pressure category as the dependent variable. Blood pressure categories are: normal (systolic less than 120
and diastolic less than 80 mmHg), elevated (systolic between 120 and 129
and diastolic less than 80 mmHg), stage I hypertension (systolic between 130 and 139
or diastolic between 80 and 89), and stage II hypertension (systolic at least 140
or diastolic at least 90 mmHg) [
4]. Covariates, the random effect term, and age-based subset analysis are described above. Beta coefficients were exponentiated to generate odds ratios and statistical significance was set at
p < 0.05.
All data processing was done with Python version 3.6.5.
All statistical analyses were done in R version 3.5.3. For linear mixed effects models we used the lmer function in the lmerTest package and for mixed effects logistic regression models we used the glmer function in the lme4 package [
18,
19].
This study was approved by the IRB at the University of Pennsylvania under protocol #826329. Informed consent requirements were waived because the study design is retrospective and observational; patients were not contacted.
Data analysis was conducted between September 2020 – April 2021.