Background
Most international conferences, including those on infection prevention and control (IPC) and infectious diseases remain scarcely accessible to an extensive set of attendees for multiple reasons (time, budget, country entry requirements, etc.). Involvement of all stakeholders, including patient and public involvement is considered critical indeed to bend the curve on the rising global and economic tide posed by antimicrobial resistance (AMR), as one example related to infections [
1]. Improving communication from scientific content delivered during the conference to the scientific community but also the general public might be essential to reach the aforementioned objective.
Twitter provides a unique opportunity to bridge the divide for researchers, patient communities and the public to engage with scientific information remotely in a more accessible, inclusive, and diverse platform keeping up with cutting-edge research, sharing knowledge, and having the opportunity to learn [
2]. Interactions with published messages include tweets, retweets which share original messages and quote tweets which include personal comments, and replies related to the original tweet. These interactions are unilateral, meaning that followers are not always followed. More recently, Twitter has reshaped the impact of scientific conferences by engaging virtual followers as documented across medical specialities [
3‐
6] including infectious diseases and IPC [
7‐
9].
Studies have identified the importance of including patients as partners in scientific conferences, helping to direct research and current discussion in a patient-centric approach, driving the future of healthcare [
10,
11].
The 5
th international consortium for prevention and infection control (ICPIC) [
12], is an established 4-day congress in the prevention of healthcare-associated infections and control of antimicrobial resistance that is held biannually. ICPIC2019 was the first in IPC conferences to integrate patient participation and conferred a patient-included™ charter status [
13] (Additional file
1: Table 1). A conference successfully meeting all five of the charter’s pillars namely: (1) codesign (patients participate in the selection of topics and speakers), (2) engagement (including patients as presenters and in the audience), (3) accommodation (support in travel and accommodation and provide scholarship), (4) disability requirements (accommodating the physical needs of patients) and 5) virtual participation (free online video streaming) may be accredited as a Patients Included™ event [
13]. Patient integration in IPC conferences is an important step to bring patients closer to the conversations driving patient safety and to ultimately improve the lives of patients and their families [
12]. Inclusion and active engagement of patients as stakeholders can help drive knowledge dissemination and identify issues that matter most to patients, caregivers and their families (Table
1).
Table 1
Labels defining the 14 clusters based on Twitter profiles
Others | Advocates and politics | Biographies mainly expressing a stance for certain causes (racial, gender, politics, environmental…) |
Characters | Non-English biographies or including special characters (that have not been filtered out) |
Hobbies, families & life balance, spiritualityb | Biographies around personal interests with a strong focus on families, religion and hobbies |
Time and place | Biographies mainly containing geo-temporal information, indicative for event, gatherings, conferences, congress, tourism… |
Advertising | Advertising | Biographies suggestive for publicity, advertisements |
Fintech–digital marketing | Fintech–digital marketing | Biographies suggestive of using novel technologies in innovation and private industries. More specific for fintech (bitcoin, cloud, blockchain), and digital marketing (social media, marketing) |
Industries | Industries and manufacturers | Biographies representing industries delivering a product |
Industry-related services | Biographies expressing a service often targeting industries. In this dataset, it has a strong focus on health care related industries, but also include human resources, consulting, education providers… |
Media | Media and music | Biographies related to audio-visual content, including authors, publishers, bloggers, editors… |
Patient support, foundation, advocacy and alternative therapies | Patient support, foundation, advocacy and alternative therapies | Biographies oriented to patient’s health, mainly using popular wording for diseases and health (disease, life, pain, chronic disease). Gather disease’s survivors, foundations or associations’ oriented to patient’s care, but also alternative therapies and caregivers |
Clinical leaders and Healthcare Workers | Physicians | Biographies expressing specialized medical wording, also gather medical organizations |
Clinical leaders and healthcare workers—healthcare quality improvement | Biographies expressing a will of healthcare quality improvement, with certain specializations or belonging to certain society/organizations |
Academic research | Academic research | Biographies expressing wording specifics for academic research such as degrees. Has a strong focus on biology and microbiology (genomic, biology, bioinformatics) |
Public and global health | Public and global health | Biographies expressing research or interests in public health concerns (sustainability, child care, equity, justice, climate) |
Twitter may enhance the experience of scientific congresses to a wider audience and generate international engagement and global reach [
14,
15]. However, this is not a guarantee for various reasons, such as the number of followers [
15], and the content of published messages that need to be informative and of interest to non-attending individuals in order to sustain engagement [
16]. Furthermore, an echoing effect has been observed with scientists mainly reaching other scientists, impacting the spread of the message to other stakeholders [
15]. Assessing this echoing effect might estimate the spread of content from scientific conferences among the general public. Through non-supervised clustering approach based on biographies of the Twitter participants and their followers, we might describe more in detail the categories of stakeholders involved in the spread of online content [
17‐
21]. As patients’ status might be hardly ascertained based on biographies, such analysis would focus on the diversity of categories of Twitter users observed, hypothesizing that they represent past, present and future patients.
This study was performed: (i) to assess how ICPIC2019 allowed conference participants to reach out to other peers (in-reach) and to non-scientific audiences (general public) (outreach) through Twitter discussion; (ii) to compare the professional background of followers of participants (“reach”), and followers that interacted with original tweets; (iii) to explore connectedness between followers of each participant and estimate the potential spread of scientific information.
Material and methods
Study design and objectives
We conducted a retrospective observational study of social media data (tweets, retweets, mentions, digital impressions) covering a total of nine days Twitter activity (from September 7 to 16, 2019) during the ICPIC patient-included™ scientific congress (September 10–13, 2019) [
12]. During this period, all tweets with the official hashtag of the congress #ICPIC2019 were extracted, including original tweets, retweets, quotes, and replies. Information on the users (defined as Tweet author here), as well as the followers of the authors (reach), was extracted.
An analysis of the digital impressions among the professional background categories of authors and their followers was conducted, including the diversity of followers among specific categories of authors, the diversity of followers that interacted with original Tweet messages concerning the scientific conference, and the connectedness between followers of each participant. Authors were defined as users who published an original message, a retweet, a quote, or a reply, including the hashtag #ICPIC2019 during the study period. Reach was defined as all the followers of these authors. Active reach were the followers that interacted with the original tweet message using quote, retweet or reply. Ethics approval was requested and waived by the IRB committee in Geneva, Switzerland.
Data extraction and pre-processing of Twitter profiles
Latent Dirichlet allocation (LDA)
Topic modelling with the unsupervised clustering method named “Latent Dirichlet Allocation” has been used in multiple fields to clustering information from social media [
17‐
20], and Twitter users together, based on their biographies [
21]. In brief, the LDA is a Bayesian method estimating the probability of words belonging to a topic (beta probabilities), and the probability of topic belonging to a biography (gamma probabilities). More information on this method is detailed in the appendix (Additional file
1).
Cluster labelling
After estimation of gamma and beta probabilities, reviewing of the biographies with the highest probability to belong to each topic, and reviewing of the words most likely associated with each topic, it was necessary to define a label for each cluster. Labels were defined by two blinded researchers (RM and ET) based on the 30 biographies with the highest gamma probability and the 20 words with the highest beta probability for each cluster. For further help, word clouds of the 50 most frequent words from biographies in each cluster were computed. Discordancies were resolved by consensus. These labels were then validated on a naive dataset (not used during the definition of labels), including five documents randomly extracted per four categories of gamma probabilities (30–50;51–60;61–80;81–100%) for all clusters. This even representation of biographies within a range of gamma probabilities helped to define a threshold of gamma proportion to ascertain a topic to a biography. Biographies previously used to define the label were not validated. In case of doubt, during the validation of these labels, the professional background of the authors was manually searched through the Internet.
Comparison of the diversity in followers
Only topics with the highest gamma probability were retained because these were most likely to accurately categorize authors and followers. Then followers of different categories of authors were compared. Twitter users with a professional background estimated based on their category were selected (by increasing the probability to belong to these clusters) to compare the diversity of their respective followers. Network analysis was used to visualize the relationship between different categories of authors and their followers.
Active followers
To estimate the reach of original tweets (active reach), users who retweeted, quoted, or replied to an original tweet were extracted to determine the number of “active followers”. Active followers, considered initially as author users because of the content they generated, will be considered as followers in this analysis. The proportion of active followers was then stratified among the different categories. Network analysis stratified by the type of interaction was also used to visualize the different actors and their respective categories.
Data extraction through Twitter Application Programming Interface, data mining, Latent Dirichlet Allocation, and Network analysis were performed using R to provide estimates of connectedness between authors and followers and according to their respective predicted categories. RStudio (v.3.6.0.) and RAnalyticFlow (v.3.0) were used with the following packages (rtweet, gggraph, iggraph, tidytext, topicmodels, tm, SnowballC, and stopwords) (R Foundation for Statistical Computing, Vienna, Austria; 2017;
https://www.R-project.org/).
Discussion
Our study used unsupervised learning in the tweets mentioning #ICPIC2019 for profiling of both authors and their respective followers according to their biographies, in the context of a patient-included™ conference. Including only English Tweets (based on their last tweet), the volume of followers and authors categorized was significant, with 235′620 followers linked to 474 authors. Unsurprisingly, we observe that the majority of Twitter users interacting during #ICPIC2019 were healthcare workers (34%), followed by industry (11%), and academic researchers (8%). These results highlight that Twitter activity during ICPIC2019 scientific congress reached a broader audience than expected. This observation supports the use of Twitter as a communication tool to increase the overall reach of disseminating scientific information [
2,
8]. In parallel to other existing commercialized methods to characterize Twitter users and followers (e.g. Symplur healthcare hashtags, Twitonomy), we were able to use this approach to measure the number of distinct followers per user, but at the same time, to keep all followers per user in order to evaluate specific relationships.
The methods used do not only rely on specific words to categorize authors and followers, but rather on their specific frequencies and distributions present in the biographies. These parameters are influenced by multiple factors indicative of gender, culture, personalities and specific interests [
22,
23]. Specific interests sometimes converged to provide a clue about professional backgrounds. We observed some clusters to be more specific than others because of the use of a specific lexicon, including healthcare workers and academic researchers. Patient-oriented biographies might include less specific vocabulary and overlap with multiple other categories.
The categories of authors largely influenced categories of followers. This finding has already been observed in a previous study [
15]. Furthermore, we observed more diversity in the reach of non-healthcare workers compared to healthcare workers. This observation was also supported by further network analysis between all followers of specific categories. Influencers with a large number of followers might also influence the diversity of reach, impact the reach of Twitter connectedness, and steer conversations [
15]. Unfortunately, this information was not accounted for in the analysis.
To note, the population of active followers only represents 0.05 to 0.3% of the total reach. Thus, it should be considered that followers might not always estimate the actual spread of a message. Interestingly, when observing the network of Twitter interactions, different categories of biographies often interacted together. We did not observe particular clusters or over-representation of specific categories, such as healthcare workers in online interactions. In the network analysis, we observed that industries or patients also participated in this online interactions and contributed to the diffusion of conference messages.
Given the homogeneity of Twitter networks from healthcare workers and academics, but the heterogeneity of professions involved in Twitter interactions, the designation of a patient-included™ status and the process of systematically addressing methods to strengthen the inclusion of patients through social media may foster the spread of core messages to non-attending individuals reaching a more diverse population. While this study cannot make this conclusion, Utingen and colleagues performed a social network analysis to analyse Twitter activity from 1672 healthcare conferences and showed that when engaged patients are included in congresses, they increase the spread of conference information flow across social networks [
11]. There is little doubt that patient inclusion can have benefits, but identifying the specific advantages requires further attention.
The SARS-CoV-2 pandemic has shifted in-person scientific conferences to virtual and digital events. The shift has provided unprecidented opportunities to use social media platforms including Twitter, to reach a wide audience across the world allowing advanced integration among users and real-time interaction of key findings [
25]. Now more than ever it is important to maximize the reach of evidence-based information on infection prevention and control from scientific conferences via social media platforms to debunk misinformation.
Limitations
First, being unable to confirm participants from the conference from an official list, we only hypothesized that Tweet authors mainly participated in the conference. Second, professions represented in biographies originally represented a mixture of probabilities between different categories. For the sake of simplicity, biographies were categorized only using the most probable category. Therefore, overlapping categories were lost in this analysis (e.g. healthcare worker and academic research). Furthermore, due to the small number of characters allowed for biographies (n = 160), the unsupervised technique is less performant and generalizable. However, above a certain threshold of gamma probabilities, especially considering specific categories, and consistently with the validation of the labels on naive datasets, this technique remained reliable for a majority of biographies. Additionally, this technique accounted for specific distributions of all words included in the biographies to ascertain a category, and not just to specific words. This allowed better discrimination compared to the presence of a single or multiple keywords. Third, only biographies with the most recent tweet composed in English were included, so all other biographies certainly also expressing related professional categories were excluded. Fourth, no other unsupervised or supervised models were performed on the dataset, so repeatability of findings was not assessed. Fifth, we only captured tweets that included the official hashtag of the conference (#ICPIC2019), this might have introduced a selection bias as it is possible that conference-related tweets were sent without the official hashtag [
24]. Nonetheless, the use of this performant analysis on a large dataset was able to identify the diversity of biographies from users and followers participating in the online discussion around ICPIC2019. These results add to the body of knowledge on Twitter use from diverse professional background and impact during academic scientific conferences focused on IPC and provide novel insights on the aforementioned points.
Conclusion
This study offers a unique perspective of the widespread reach of IPC messaging through the use of Twitter social media platform from a single conference. It highlights the potential to increase the dissemination of research across on an array of networks thereby increasing the total Twitter output generated from in-person and virtual scientific conferences. The systematic analysis based on Twitter biographical information can be a useful adjunct to other methods utilised in data science, providing a feasible and useful future direction for the exploration of reach. Furthermore, the present study also suggests that patient-included™ conferences may have a positive impact on overall reach not only to other patients and the public in general, but for the engagement of numerous stakeholders ranging from media to industry, key for IPC. Congress organizers should implement a social media strategy and promote the use of Twitter conference hashtag pre, post and during the event. This strategy offers a useful direction to help disseminate timely information and increase virtual participation of patients, the public and non-attending individuals as highlighted in the patient-includedTM conference charter clauses.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit
http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (
http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.