Background
Regional Population health Management (PHM) initiatives are challenged to evaluate regional patient experiences to improve their health and social services. These initiatives have been increasingly widening their focus from individuals to populations [
1,
2] to deal with the changing care demand. Their intent is often to achieve the Triple Aim; i.e. simultaneously improve population health and the experienced quality of care, while reducing costs [
3]. This, combined with a more general focus in care on how patients’ experience care [
4,
5], makes it essential for PHM initiatives to evaluate regional patient experiences.
Many reforms struggle with evaluating population level experienced quality of care to assess their regional policies, resulting in a variety of methods used [
6,
7]. Currently, solicited surveys are dominant, with well-known examples such as the Hospital Consumer Assessment of Healthcare Providers and systems (HCAHPS) and the NHS Inpatient Survey [
8,
9]. Notwithstanding the value of solicited surveys, they often have substantial downsides such as the significant time lag between measurement and publication, low response rates and high costs to deploy [
10,
11]. A potential other source might be found in websites that gather unsolicited online ratings. Analogue to other sectors, where consumers are getting used to voicing their experiences online using for example
Yelp.com [
12] and other social networks such as Twitter and Facebook [
10], patients are increasingly sharing their experiences on the internet [
13,
14]. Patients can often rate their care experiences on general websites like Yelp or Facebook, as well as specialized websites such as RateMDs and HealthGrades.
Unsolicited online patient ratings have shown promise for creating insight into experienced quality of care at the provider level. At this level, they seem to be especially useful as an additional perspective or source of information complementing solicited surveys [
11,
15,
16] or as a more real-time alternative [
17]. In the Netherlands, the most widely used patient rating website is
ZorgkaartNederland.nl (Dutch Care Map, ZKN) [
18], which is run by the Dutch Patient Federation (DPF). Since 2007, over 500,000 experiences with different individual care providers, hospitals and other care institution were shared by patients on the ZKN website. Their experiences might prove valuable for policy makers as it could compile close to real-time information regarding progress on one of the pillars of the Triple Aim and might be of value for comparisons between regions. However, the extent to which these provider level online ratings can be used to create overall insight for (regional) population level policies is unclear. If combined, they could be used to measure overall regional quality of care and aid regional policy evaluations in a relatively simple and cost-effective manner.
Therefore, this study aimed to explore whether unsolicited online provider ratings can be used to create insight into differences in patient experiences between as well as within regions. To asses this, the structure and regional coherence of online unsolicited ratings will be studied. Additionally, a comparison will be made between the results of unsolicited ratings and the dominant method, solicited surveys, in nine regions to explore their differences.
Methods
Study population
In 2013 the Dutch minister of Health appointed nine regions as pioneer sites because of their goal to implement regional policies according to the Triple Aim [
19]. These pioneer sites are demarcated geographical areas in which different organizations work together to achieve this goal [
20]. Each site has their own approach with different organizations involved, such as hospitals, municipalities or insurance companies. They are monitored by the National Institute for Public Health and the Environment in the so-called National Monitor Population management (NMP) and are spread out across the Netherlands. Overall, about 2 million people live in these regions, but the size as well as the characteristics of the population in each region varies (Table 3 in
Appendix 1).
Data sources
Two data sources were used in this study; the primary focus were the unsolicited online patient ratings provided by the DPF, while the solicited survey data provided by the NMP was used predominantly for comparative reasons.
The unsolicited online patient ratings were derived from the
www.ZorgkaartNederland.nl (ZKN) website, which was made available by the DPF. On this website, patients can both give and see reviews. To add a review, patients first select a care provider, which can be a care professional like a specific GP or specialist, or an organization such as a hospital (department) or nursing home. Six ratings have to be given, ranging from 1 to 10, covering six quality of care dimensions. The dimensions differ depending on the category of provider that is selected (i.e. for hospital care they are appointments, accommodation, employees, listening, information and treatment). Additionally, there is a textbox where patients can explain their ratings and add other relevant comments as well as the condition they were treated for. No further personal information about the respondent is requested, but a timestamp and email address is registered. The ZKN staff checks each submission for repeated entries, integrality and anomalies, and gives each one an identifier.
The solicited survey data used was provided by the NMP (Ethical Review Board number: EC-2014.39) [
21]. In nine pioneer sites, a random sample of 600 insured adults per pioneer site (total = 5400) were invited to fill out the survey between December of 2014 and January of 2015 [
21]. This yielded 2491 filled-out surveys (response rate 46.1%), around 300 per pioneer site. The average age was 55.7 years old and more than a quarter was highly educated (Table 3 in
Appendix 1). The solicited survey population has previously been described in more detail [
22]. For this study only the following question was used: “On a scale from 0 to 10, where 0 is the worst possible care and 10 is the best possible care, which grade would you give the total care you received in the past 12 months?” Thus, in this dataset only ratings given for overall care experiences were available.
Ratings
In addition to ratings given through the website, the DPF actively gathers ratings by visiting care providers. These visits predominantly focus on residents of nursing homes and each is logged using a unique identifier. To distinguish between ratings given unsolicited and those gathered by the DPF, the number of occurrences of the unique identifier was checked. Identifiers that showed up more than 10 times were labeled as solicited, while others were labeled as unsolicited. A mean rating was calculated for each entry by averaging the six ratings provided. This combination was shown to provide a good summary of an entry [
23]. Ratings and providers were clustered at the regional level using the nine pioneer sites’ zip codes [
24]. Additionally, a second set of regions was created for the below described sensitivity analyses. These regions were identified using zip codes based on nine regional initiatives that were not included in the NMP [
25]. Furthermore, providers in the
Zorgkaart data were grouped into the following categories: hospital care, nursing homes, general physicians (GP), insurer, birth care, pharmacy, physiotherapy, youth care, dental care and ‘others’. In the survey data, the only alteration made was the combination of the 0 and 1 ratings to create scales that both have 10-points, running from 1 to 10. These ratings were already grouped by pioneer site.
Analyses
First, descriptive statistics were extracted for both the unsolicited online ratings and solicited survey data to explore the structures of both data sets. Rating frequencies and means were determined per pioneer site overall and for the online data these were also stratified by the largest provider categories; hospital, GPs, dental care and nursing home. To compare the two datasets, means, using independent t-test, and distributions were studied. Additionally, with each dataset an Analysis of Variance (ANOVA) was performed to test for differences between pioneer sites and Spearman’s rho was determined to look at the correlation of mean scores based on both the unsolicited online ratings and solicited survey data.
Second, multilevel analyses were performed using the unsolicited online ratings based on three levels; 1) rating, 2) provider and 3) pioneer site, in order to gain insight in the regional clustering of ratings. If ratings were clustered, then this would indicate that found mean differences between regions could be attributed to characteristics of the regions. The year a rating was given was added as a fixed variable to adjust for changes over time, such as the introduction of population health policies. Using the three levels and the ratings as dependent variables, the model was run to determine intraclass correlation coefficients (ICC, range = 0–1). The ICC is a measure of similarity between values from the same group and provides insight into the clustering of, in this case, unsolicited ratings in regions and in providers. At which level this clustering is meaningful is assessed based on ratio of the between person (i.e. provider) variance and within person variance [
26]. To have interpretable results, levels within a multilevel analyses have to be interpretable and similar, therefore models were tested per provider category.
Finally, a sensitivity analysis was added to assure that found results in the multilevel analyses were not due to region selection and would be comparable with other than the nine selected regions. Multilevel analyses were repeated with alternative regions for this purpose.
SPSS 22 (SPSS Inc., Chicago, Illinois) and R Studio Version 0.99.441 for Windows (RStudio, Boston, Massachusetts) were used to perform the analyses described below. A p-value below 0.05 was considered significant in all analyses.
Discussion
This study explored whether unsolicited online provider based ratings can provide insight into patient experiences at a (regional) population level. The overall mean scores as well mean scores stratified by provider category (e.g. hospital (department) and GP practices) differed significantly between pioneer sites. However, most differences were small as they often only varied a few tenths on a 10-point scale. This in itself is not an issue, as care experiences might be comparable between each region, as similar small differences were found in solicited survey. Unsolicited ratings did overall score higher. Multilevel analyses conducted using the unsolicited online ratings among GPs, dental care and nursing homes at the pioneer site level indicated there was little clustering of ratings between providers in the same region. This makes it difficult to attribute any found variation to regional (i.e. population) level differences. The provider level did show a meaningful clustering of ratings, suggesting differences could be explained by provider variation.
Unsolicited online ratings cannot be used to gain insight in differences between regions for now. There appeared to be little clustering of experienced quality of care between providers in the same region. This lack of regional grouping of experienced quality can also be seen using other measures [
27]. Currently, this limits the use of unsolicited online ratings as an evaluation tool for the regional level. Even though the goal was not to evaluate any specific policy, PHM initiatives in the Netherlands have only started implementing regional collaborations five years ago and many are still in the start-up stage. Regional policies have shown the potential to impact quality of care [
28], but in the Netherlands, initiatives might require more time to have an impact.
When looking at ratings separated by the individual dimensions, notable patterns emerged. For example,
Vitaal Vechtdal was rated the highest overall, but had a substantial dip in the accommodation dimension and was rated lowest in the dental care category. Similar interesting patterns could be seen in other pioneer sites and this, keeping the low ICCs in mind, could be insightful for both policy makers and providers. Furthermore, the dispersion in ratings between providers was substantial. This illustrates the variation of experienced quality of care of providers within a region; there are providers that are performing better than other providers. This is in line with previous studies, which showed that care providers in the same region differ in the care experience they deliver [
27]. To be able to identify variations in providers is useful for regional policymakers, as it illustrates there is room for poorer scoring providers to be identified and improved. Ideally, by stimulating integration and cooperation within healthcare, overall experienced quality of care should improve and ratings should be more geographically coherent.
This study is the first to explore the use of unsolicited online care provider ratings at the regional level. Results are not yet consistent[
16],but several studies show the potential correlation between ratings and hospital readmissions as well as other objective quality of care measures [
15,
29]. Furthermore, they can be used by care inspection agencies as an additional input source [
18] and have shown to impact real world behavior of consumers [
30]. However, online ratings and the used
Zorgkaart data in particular have limitations that have to be considered. Patients can give more than one rating, making them not completely independent. However, correcting for this using a cross-classified multilevel model, which is not preferred as it is very skewed as most patients give one rating, did not show any different results. Additionally,
Zorgkaarts’ unsolicited ratings are given to providers and adjustments have to be made to be able to evaluate at population level. A more direct measure of general population level experienced quality of care would be preferable. Furthermore, the present number of ratings were insufficient for in-depth analyses for many provider categories in this study (e.g. insurance companies and disabled care). Additionally, several providers had only a few ratings available for the multilevel analysis. The
Zorgkaart data showed that the frequency at which ratings are submitted has been increasing rapidly over the years and it is therefore expected that the low numbers issue will be solved over time.
Zorgkaart, as do most rating sites, also has limited participant information for privacy reasons. This means it is impossible to correct for selection bias, while a younger, more tech-savvy population tends to provide ratings [
31]. Finally, there was limited opportunity to connect the unsolicited dataset to solicited dataset. The solicited dataset did not target specific quality dimensions like
Zorgkaart, which prevented a comparison or conclusions at this level. For regions, dimension specific information and comparisons could be useful.
For regional population evaluations, online ratings could be improved. First, it is worth performing a follow-up study in a few years to determine if the same conclusions can be drawn. By this time, more ratings are available and the initiatives will have had more time to form their interventions and have an impact. Second, an algorithm could be created that highlights poor performing or declining providers within a certain area for policy makers to faster identify and aid underperforming providers. Third, text comments accompanying ratings can provide an additional source of information [
32] and could provide more detail for policy makers as well as providers [
33]. Finally, to truly evaluate regional policies that go beyond healthcare, broader measures are required that cover preventive, well-being and social services. The current ratings are generally only focused on the quality of healthcare services. Expansion of current or the creation of new instruments would be needed to drastically improve their use for current regional health policies that go beyond clinical care.