Introduction
Lichen planus (LP) is an inflammatory skin disorder estimated to affect between 0.5 and 1% of the population worldwide [
1,
2]. LP can present in various forms across the body [
3]. Cutaneous LP (CLP) lesions are the most common type of LP and are characterized by polygonal purple papules on the skin, often associated with severe itch and typically affecting flexor surfaces including the wrists, ankles and lower back [
4]. Lichen planopilaris (LPP) is a follicular variant of LP and is most common in females [
2]. LPP can present as painful and itchy patches of hair loss, predominantly localized to the centre of the scalp, along the frontal hair line and/or in the eyebrows [
5]. If untreated, LPP can lead to irreversible scarring and alopecia [
4]. Mucosal LP (MLP) lesions typically present as asymptomatic bilateral white striations or painful plaques localized in mucosal areas including buccal mucosa, tongue and gingivae, genitalia and conjunctiva [
2,
4,
6]. Individuals may be diagnosed with more than one LP subtype, based on the clinical presentation [
4].
Given the range of LP signs and symptoms (including itch, pain and a burning sensation at the affected areas) [
1,
8‐
11], LP can have a significant impact on patients’ health-related quality of life (HRQoL) [
4]. While qualitative literature is limited, there is evidence that LP patients, particularly CLP and MLP patients, experience psychological impacts including anxiety and depression [
12]. Patients with oral MLP also report experiencing significant impacts to daily activities such as discomfort when having certain foods and drinks, which in some cases can result in depression and high levels of stress and anxiety [
13,
14]. LPP patients have reported impacts on social interactions and daily activities as a result of scarring and hair loss, causing patients to have low self-esteem and feel self-consciousness [
15].
Patient-reported outcome measures (PROMs) are commonly used in routine medical practice and clinical studies to measure symptoms and HRQoL from the patient perspective. It is important that PROMs are appropriate and fit for purpose in terms of content validity and psychometric validity in the context of use [
16]. A review of existing PROMs used in LP and other similar dermatological conditions identified several PROMs that could be appropriate for use in LP clinical development programs. Specifically, dermatological measures such as the Dermatology Life Quality Index (DLQI) [
17] and Scalpdex [
18], and non-specific disease measures such as the Epworth Sleepiness Scale (ESS) [
19], have been used to assess HRQoL in LP patients [
15,
20‐
23]. While there is some evidence of content validity and psychometric properties for these measures in some dermatological conditions [
23,
24], there is limited evidence to support their use in an LP population [
25]. In contrast, while existing LP-specific PROMs such as the recently developed Oral Lichen Planus Symptom Severity Measure (OLPSSM) have strong content validity [
8,
26], there is no published additional evidence of psychometric validation in an LP (nor any other) population.
To address the gaps in evidence and align with regulatory standards [
16,
27], the current study aimed to assess the content validity and psychometric measurement properties of the DLQI, ESS, Scalpdex and OLPSSM in an LP population through the conduct of qualitative patient interviews and psychometric analysis of data from an international Phase 2 LP clinical study. Aligned with the United States Food and Drug Administration (FDA) patient-focused drug development (PFDD) guidance documents, a mixed-method approach was used to ensure that the patient voice was represented in the evaluation of the select PROMs and in future clinical study design in LP [
28‐
31].
Methods
Study Design
This study was conducted in two phases: In the quantitative phase the psychometric properties of the DLQI, ESS, Scalpdex and OLPSSM were assessed in an LP population. In the qualitative phase content validity of the measures was evaluated via cognitive debriefing interviews.
Compliance with Ethics Guidelines
Ethical approval and oversight were obtained for the clinical study including exit interviews ([clinicaltrials.gov ID: NCT04300296, EUDRACT: 2019-003588-24]) and the independent qualitative interviews (Western Copernicus Group Independent Review Board [WCG IRB; reference: 20216826]). The studies were performed in accordance with the Helsinki Declaration of 1964 and its later amendments, and all participants provided informed consent indicating their data will be used for medical research purposes and the study results may be published.
Quantitative Phase
The quantitative phase used data collected from a global, randomized, double-blind, placebo-controlled, multi-centre, parallel-group Phase 2 clinical study involving 111 adults with biopsy-proven forms of moderate to severe LP (based on Investigator Global Assessment [IGA] rating of ≥ 3) who were eligible for systemic therapy and not adequately controlled with topical corticosteroids of high-ultrahigh potency in the opinion of the investigator. The study consisted of three cohorts (CLP, MLP and LPP) and two treatment periods (treatment period 1: baseline to Week 16; treatment period 2: Week 16 to Week 32) (Supplementary Material). For the psychometric analyses, treatment period 1 data were used. The PROMs selected were included as secondary or exploratory study endpoints.
Overview of PROMs
Table
1 provides a brief description of the PROMs included in the planned analyses and the cohorts they were administered to within the clinical study. Licenses to use the PROMs in the clinical study were obtained.
Table 1
Overview of the PROMs included in the quantitative phase of the study
Anchor Measures
Anchor measures were developed and administered in the LP clinical study to the full clinical sample to support psychometric evaluation of the PROMs [
16]. This included a five-point patient global impression of severity (PGI-S) item, a five-point patient global impression of change (PGI-C) item, a five-point Investigator’s Global Assessment (IGA) scale and Item 1 of the DLQI (‘Over the last week, how itchy, sore, painful or stinging has your skin been?’). The PGI-S and the IGA were administered at baseline and at Week 2, 4, 8, 12 and 16; the PGI-C was administered at Week 2, 4, 8, 12 and 16.
Psychometric Analysis
Item- and scale-level psychometric analyses were conducted (Table
2). Unless noted otherwise, Week 4 data were used, as this time point was identified to provide a greater range of scores. As the PROMs were not appropriate for use in all LP types, analyses were conducted with different patient samples, e.g., DLQI and ESS with all LP types (
n = 111), Scalpdex with LPP only (
n = 37) and OLPSSM with MLP patients with oral LP (
n = 33). The aim of this study was not to evaluate the structure of the questionnaires; therefore, factor analyses were not conducted.
Table 2
Summary of psychometric analyses
Qualitative Phase
The qualitative phase assessed the content validity of the PROMs via cognitive debriefing interviews. Given that the DLQI, ESS, Scalpdex and OLPSSM are existing validated measures, only relevance will be reported on, as evidence of understanding is already available from the original development studies and consequent studies evaluating their use. An overview of the study procedure is provided in Supplementary Material, with further detail described in the subsequent sections.
Sample and Recruitment
A subset of patients (
n = 13) enrolled in the Phase 2 LP clinical study in the US were invited to participate in an exit interview once they had completed all treatment visits to Week 32 but before their Week 40 follow-up visit. Participation was voluntary and patients could opt-out from taking part in an interview; patients who withdrew from the clinical study early were not eligible to participate in an exit interview. To further enhance the sample size, an additional and independent sample of patients (
n = 45) were recruited by third-party recruitment agencies via referring clinicians in the US and Germany to participate in a qualitative interview. Inclusion and exclusion criteria for the independent interviews were broadly reflective of the LP clinical study eligibility criteria. Based on previous research, the sample included was deemed sufficient for assessing the content validity of the PROMs [
32].
Interview Procedure
Interviews were 60 min and conducted via telephone by trained qualitative interviewers in the patient’s native language using a semi-structured interview guide to facilitate the discussions. The cognitive debriefing (CD) section of the interview, which aimed to explore the relevance of the concepts assessed in the PROMs, lasted approximately 30 min and consisted of direct and focused questions.
Qualitative Analysis
All interviews were audio-recorded and transcribed verbatim with identifiable information redacted; the German interviews were further translated to English. Interview transcripts were analysed using Atlas.ti (Version 22) [
33] using a framework approach [
34]. Dichotomous codes were assigned to each item, instruction, response option(s) and recall period to indicate whether it was understood, relevant and/or appropriate, and why. Further codes captured any suggested changes.
Discussion
There are limited disease-specific PROMs that assess HRQoL in LP patients and a scarcity of psychometric evidence for the use of generic HRQoL PROMs in this population. The analyses described in this study evaluated the content validity and psychometric properties of the DLQI, ESS, Scalpdex and OLPSSM to assess appropriateness of use in clinical trials with LP patients. Importantly, the mixed methods approach adopted allows for the patient voice to be represented not only in this study but in future clinical study designs, as recommended by the PFDD guidance documents [
28‐
31] and followed the FDA recommendation for evidence-based rationale when proposing a clinical outcome assessment (COA) as fit for purpose [
30]. Specifically, the approach adopted allowed for the assessment of whether the PROMs capture all important aspects of the concept of interest; that the method of scoring is appropriate and sufficiently sensitive to reflect clinically meaningful change within the context of use; that respondents understand the items as intended; that differences in scores can be interpreted in terms of impact on patient’s experience and that scores correspond to specific health experiences of patients [
30]. The study also included exit interviews, which the FDA have noted as a valuable tool to contribute cumulative evidence on aspects of the patient experience; inform development or refinement of COAs; add greater depth to data in diseases, such as LP, that do not have much qualitative patient input; and to obtain patient input on meaningful outcomes [
29].
While the DLQI is one of the most widely used PROM in multiple dermatological indications and has also been commonly used with LP patients [
17], content and psychometric evidence of its appropriateness in LP patients for usage in clinical studies is limited [
21]. The current study on the one hand supports the use of the DLQI in LP patients, as findings provide strong evidence of reliability and construct validity. The DLQI domain ‘Symptoms and feelings’ performed particularly well. On the other hand, the psychometric data do not confidently support that the DLQI can detect change over time in the specific context of use for adults with LP as high inter-item correlations between some items suggest potential redundancies. The qualitative interview data further suggest that patients did not consider most items relevant to their disease experience of LP. Given the modular nature of the DLQI, the study data support the use of the ‘Symptoms and feelings’ domain as an independent module with LP patients, where necessary and appropriate.
Even though the ESS demonstrated evidence of reliability in other populations, convergent validity was poor in this study. Furthermore, known-group comparisons showed evidence of the ESS’ ability to discriminate between groups for the PGI-S but not the IGA; ability to detect change was limited or null. These findings suggest that the ESS may not be appropriate for use in clinical trials with LP patients. This is supported by the qualitative findings where most participants reported that they never felt sleepy or wanted to fall asleep because of their LP, although some patients did spontaneously report sleep-related impacts, such as sleep disturbance (i.e., sleep quality and/or sleep quantity). It is suggested that measures that assess sleep rather than daytime sleepiness should be used in clinical studies with LP patients. However, further research is needed to ascertain whether sleep is a meaningful and important concept of LP, as data are scarce [
20].
The Scalpdex performed relatively well when psychometrically evaluated in the study’s LPP patient sample, demonstrating evidence of internal consistency, test-retest reliability and convergent validity (although only weak correlations with PGI-C and IGA). There was mixed evidence to differentiate between known groups and to report an ability to detect change. Not all items may be appropriate for use with LPP patients. For example, inter-item correlations for Item 19 and Item 20 were much weaker than the rest of the items, while Item 15 demonstrated weak correlations with the other ‘Function’ domain items and Item 8 had overall very weak correlations including other ‘Symptoms’ domain items, which is particularly concerning as the ‘Symptoms’ domain only consisted of three items. These findings are not surprising as the Scalpdex was originally developed with patients with seborrheic dermatitis and scalp psoriasis [
18]. Clinical characteristics present in these patients, such as desquamation and bleeding [
23], may not be relevant to LPP patients. This finding is supported by the qualitative CD interviews and the original Scalpdex development study whereby the impact of desquamation, as assessed via Item 15, was reported as not relevant by a high percentage of patients [
18]. Based on the study findings, it is suggested that the Scalpdex may be used with caution with LPP patients and that further evidence is needed when it is used in clinical trials. A potential further limitation of the Scalpdex is its length with 23 items that might be viewed as burdensome for many patients, particularly if some items are deemed not relevant. Similar to the DLQI, the Scalpdex ‘Symptoms’ domain performed better than the measure as a whole, but caution should be taken if the acceptable performance of the measure total score is purely driven by the ‘Symptoms’ domain-specific items.
Lastly, the OLPSSM, as psychometrically evaluated in MLP patients with oral involvement, had evidence of good reliability, construct validity and ability to detect change over time (PGI-S and PGI-C). It is not surprising that the OLPSSM performed well as it was designed specifically for patients with oral lichen planus and has been previously used within similar populations [
8,
38]. However, despite the psychometric validity of this measure, it is worth noting that not all items may be relevant to all patients with oral involvement. For example, Item 4 and Item 5 have been noted in the literature and supported by the qualitative interviews in the current study as triggers least likely to cause soreness and are associated more with patients with severe OLP [
8]. Furthermore, inter-item correlations between Item 1 and Item 6 were weak, suggesting that these two items might measure dissimilar concepts while correlations between Item 2 and Item 5 were very high, suggesting potential redundancy. Lastly, the OLPSSM is limited in its use to patients with oral involvement [
8,
38], leaving a gap for other LP patients. Overall, the data suggest that the OLPSSM is a valid HRQoL PROM for use with patients with OLP.
Study Limitations
Given the potential limitation of a relatively small sample size of some LP cohorts in the current study, particularly for the OLPSSM and Scalpdex, future research in a larger sample size is recommended to strengthen the findings. Further research is also recommended to review other existing HRQoL measures that may be used in LP patients.
Conclusion
The results of our study contribute to the literature by providing novel insights into the appropriateness of existing PROMs commonly used with LP patients. Our study further highlights the need for additional psychometric evaluation and qualitative evidence to assess whether PROMs under consideration are “fit for purpose” for use in future LP clinical studies and support the development of additional LP specific HRQoL PROMs.