Background
Methods
Search strategy
Eligible studies
Assessment framework
Extraction of data items and attribute appraisal
Critical appraisal of studies
Results
Documented uses of the Wordscores method
Author | Aim | Wordscores use strengths | Type of document analysed | Methodological considerations | Implications for mental health policy |
---|---|---|---|---|---|
Baek, Cappella and Bindman* [29] | To explore the usefulness and reliability of automated content analysis of answers to open-ended survey questions from a survey on bioethical issues in genetics research. | Effective, efficient, useful, simple, reliable, high face validity, systematic, versatile, flexible, systematic, consistent, superior to manual coding. | Open-ended cross-sectional survey responses (n = 1961) to questions on bioethical issues. | Appropriate, effective for coding purposes and efficient for extracting concepts from large texts. Also effective for analysis of short, informal survey responses. Word usage in reference and target texts needs to be similar for optimal analysis. | Choose reference text samples carefully to ensure comparability. |
Baumann, Debus and Müller [41] | To evaluate policy positions on abortion legislation using automated content analysis. | Effective, useful, versatile, flexible, appropriate, strong validity, a promising method that helps inform a rich (nuanced) analysis. | Policy opinion surveys among constituents and speeches on abortion policy in the Irish parliament, 2001–2013, speeches by legislators and advocacy groups. | Effective and reliable method for policy analysis of health-related issues. Suitable for analysis of speeches, debates, policy drafts, advocacy group statements, and constituent surveys. | Consider all stages of the legislative process when selecting a scope of texts for analysis for a richer account. |
Bernauer and Bräuninger* [38] | To estimate policy positions of legislators on the left-right scale using Wordscores and to explore links between intra-party faction membership and policy positions. | Effective, reliable, useful, convincing, strong validity, and language-blind (can be used to analyse in any language). | Plenary speeches of legislators (n = 453) of the German Bundestag, 2002–2010. Policy positions related to economy and social issues including abortion and euthanasia. | Ensure that ‘virgin’ and reference texts are of similar length and type for optimal comparison. | Comparative research across document types and political actors is recommended. Requires careful selection of reference texts to ensure context requirements are met and analysis is reliable. |
Budge and Pennings* [37] | Assesses the Wordscores method for reliability and validity in policy position analysis. | Efficient, simple, systematic, innovative, consistent, promising method. | British party manifestos, 1979–1997. | Promising method for repeated use over time. Results dependent on initial document sets selected for analysis. | Aggregate texts within each time period to create reference sets for pairwise comparisons. |
Coffé and Da Roit [39] | To explore changes in party positioning on social and economic issues after a major political event. | Useful, appropriate, reliable. | Party programs for 2006–2008 Italian coalitions: Casa delle Liberta’ and Unione on economic and social policies. | Standard errors accompany each score estimation. No validity issues when rescaling raw score estimates. | Suitable method for analysis of reference texts of varying length and context. |
Costa, Gilmore, Peeters, McKee and Stuckler [28] | To determine the influence of the tobacco industry on EU Tobacco Products Directive. | Reliable, simple, objective, innovative, superior to manual coding. | EU Tobacco Products Directive policy drafts, tobacco and health lobby group position papers (n = 20). | Requires some prior technical knowledge. Enables visualisations. Clustering effects of virgin text scores: reference text scores tend to be more extreme than in virgin texts. | Efficient for rapid analysis of changes in health policies through different draft stages. |
Debus [35] | To explore various methods for analysing policy preferences of political actors. | Robust, reliable, and superior to human coding. | German party programs. | Estimation was left solely to computer algorithms to remove human error. Assumes the systematic use of certain words by policymakers. | Enables the researcher to identify policy or program positions and directions based on automated content analysis. |
Hug and Schulz [40] | To assess changes in policy positions over time using various content analysis methods including Wordscores. | Effective, efficient, reliable, strong validity, suitable for retrospective analysis. | Swiss party manifestos, roll call data from the Swiss parliament and voting recommendations, 1991–2003. Range of policy positions including health and social policies. | Wordscores produces most reliable and consistent policy estimates when compared to other content analysis approaches in time series analysis. | Reference texts must have enduring relevance over time period being examined to produce correct measures for policy position changes over time. |
Klemmensen, Hobolt and Hansen* [30] | Assesses the Wordscores method for usefulness, reliability and validity in policy position analysis. | Efficient, cost-effective, valid, easy to use, systematic, innovative, flexible, and versatile. | Danish election manifestos from 1945 to 2005 and speeches in parliament. | Supplies time series of policy positions with high face validity. Can be used with Stata/Java. May work best with longer texts. | Can be used for retrospective analysis of policy positions. |
Laver, Benoit and Garry* [25] | To explore the usefulness of the Wordscores method for analysing policy positions of legislators. | Effective, efficient, simple, easy to use, quick, reliable, systematic, strong validity, inexpensive, flexible, innovative, versatile, language blind. | Party manifestos and speeches of legislators in the British, Irish and German parliaments on economic and social policies (incl. abortion), expert surveys on policy positions, 1990–1997. | Efficient and rapid method of text analysis with large number of potential applications. Ensure policy assumptions in reference texts are valid prior to analysis. Researcher not required to understand text. | Applicable across languages and contexts but comparisons must be made between texts of similar context and format to ensure validity. |
Lowe* [31] | To explore the strengths and weaknesses of Wordscores. | Effective, simple, easy to use, versatile, flexible, and empirically successful. | Other studies conducted using Wordscores method. | Lack of functional and distributional assumptions. Choose reference texts of a similar nature to the texts under investigation. | When using Wordscores, analyse research to select texts suitable for comparison. |
Volkens* [36] | To evaluate strengths and weaknesses of three approaches to measuring party policy positions. | Effective, simple, easy to use, quick, reliable, versatile, flexible, promising, high validity, suitable for retrospective analysis, innovative, superior to manual coding, and has useful internal checks and controls. | Other studies conducted on party policy assessment methodologies including Wordscores. | Analysis at specific time points enables the creation of a time line between cause and effect. Focus on text rather than meaning not considered as problematic. | Comparable to expert analysis, but more reliable than manual coding. Wordscores requires skilled researchers to make efficient coding decisions. |
Documented strengths of the Wordscores method
Study | Ease of use | Versatility | Resource efficiency | Reliability and validity | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Effective | Simple to use | Easy to use | Quick | Language blind | Versatile | Range of applications | Flexible | Useful | Efficient | Cost-effective | Reliable | Good with large texts | Systematic | Inbuilt cross-validation for reliability | High face validity | Equal or better than manual coding | |
Baek, Cappella and Bindman* [29] | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | |||
Baumann, Debus and Müller [41] | ☑ | ☑ | ☑ | ☑ | ☑ | ||||||||||||
Bernauer and Bräuninger* [38] | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | |||||||||||
Budge and Pennings* [37] | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ||||||||||
Coffé and Da Roit [39] | ☑ | ☑ | |||||||||||||||
Costa, Gilmore, Peeters, McKee and Stuckler [28] | ☑ | ☑ | ☑ | ☑ | |||||||||||||
Debus [35] | ☑ | ☑ | ☑ | ☑ | |||||||||||||
Hug and Schulz [40] | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | |||||||||
Klemmensen, Hobolt and Hansen* [30] | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | |||
Laver, Benoit and Garry* [25] | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ |
Lowe* [31] | ☑ | ☑ | ☑ | ☑ | ☑ | ||||||||||||
Volkens* [36] | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ |
Ease of use
Versatility
Reliability and validity
Author | Study design | Reliability and validity measures | Bias |
---|---|---|---|
Baek, Cappella and Bindman* [29] | Formal evaluation of Wordscores. Good description of methods. Testing of two automated content analytic methods to assess validity in comparison to manual coding. Specified method of study as ‘testing’. | Completed reliability and validity tests. Used Krippendorff’s alpha (.61 using 7% of reference tests and > .70 using 50% of reference tests), concurrent validity and comparative predictive validity tests. | Only two methods tested for reliability including the method of affective intonation created by authors of this article. This may potentially mean favourable results are expected for this method. Low risk of bias. |
Baumann, Debus and Müller [41] | Used Wordscores without a formal evaluation. Some study design issues. Lack of background literature for Wordscores. Descriptive analysis without references to statistical significance. Results tabulated. | None reported. | No limitations to study design or using Wordscores mentioned. Unclear risk of bias. |
Bernauer and Bräuninger* [38] | Formal evaluation of Wordscores. Good description of the study design. Descriptive statistics, results tabulated. | Used expert scoring for reliability comparison with Wordscores. Compared results of validity tests. Established strong face validity for Wordscores. | Good discussion of strengths and limitations of study design. Low risk of bias. |
Budge and Pennings* [37] | Formal evaluation of Wordscores. Focus is a comparative evaluation of methods. Descriptive statistics, results tabulated. | Used the Comparative Manifesto Project and expert scoring for reliability comparison with Wordscores results. Emphasis on validity and reliability testing. Established some reliability issues for analysing policy positions using Wordscores. | Article is part of a debate series. Starts with premise that computerised content analysis does not work and continues to build a case against the Wordscores. High risk of bias. |
Coffé and Da Roit [39] | Used Wordscores without a formal evaluation. Good description of the study design. Descriptive statistics, results tabulated. | Used the Comparative Manifesto Project for reliability comparison with Wordscores results. | No limitations to study design mentioned. Unclear bias. |
Costa, Gilmore, Peeters, McKee and Stuckler [28] | Used Wordscores without a formal evaluation as its aim. Study design described in detail and advised as quantitative automated content analysis. Descriptive statistics, results tabulated. | Used various reference texts to test the reliability of Wordscores. No expert comparisons or validity tests performed. | Accounted for potential issues with reliability and validity for Wordscores. Wordscores thoroughly assessed for both strengths and limitations. Low risk of bias. |
Debus [35] | Editorial. Good description of different methods of content analysis, including Wordscores. Good overview of strengths and benefits of Wordscores compared to other content analysis methods. No data sets analysed. No result tables. | None reported. | Editorial for a special issue dedicated to content analysis methods including automated content analysis and therefore potential bias in favour of automated content analysis methods. High risk of bias. |
Hug and Schulz [40] | Used Wordscores without a formal evaluation as its aim. Good description of methods. Several analyses conducted with different reference texts to test method. Descriptive statistics, results tabulated. Measures to statistically significant levels. | Completed reliability and validity tests. Used expert data and the Comparative Manifesto Project for reliability comparison with Wordscores, accounted for limitations in all methods. | Potential impacts on reliability and validity assessed. Wordscores thoroughly assessed for both strengths and limitations. Low risk of bias. |
Klemmensen, Hobolt and Hansen* [30] | Formal evaluation of Wordscores. Good description of methods. Several analyses conducted with different reference texts to test method. Descriptive statistics, results tabulated. Measures to statistically significant levels. Good quality analysis. | Completed reliability and validity tests. Used expert data, the Comparative Manifesto Project for reliability comparison with Wordscores and another automated method (Spearman’s rho used for correspondence analysis). | Potential impacts on reliability and validity assessed. Wordscores thoroughly assessed for both strengths and limitations. Low risk of bias. |
Laver, Benoit and Garry* [25] | Formal evaluation of Wordscores. Stated as study design cross-validation of different methods to validate policy estimates. Used both English and non-English speaking texts for cross-validation. Good description of methods. | Completed reliability and validity tests using expert data, the Comparative Manifesto Project and Wordscores for comparison. | Potential impacts on reliability and validity assessed. Wordscores thoroughly assessed for both strengths and limitations. These are authors of the Wordscores method. Low risk of bias. |
Lowe* [31] | Formal evaluation of Wordscores. Good overview of the mechanics of how Wordscores works. Focuses on processes which Wordscores algorithm uses for score estimation and its reliability. | Core focus is on reliability testing of Wordscores. | Starts with a hypothesis that there are issues with Wordscores and continues to build a case against the method. High risk of bias. |
Volkens* [36] | Formal evaluation of Wordscores. Tabulated overviews of the strengths and the weaknesses of three methods evaluated. | Used expert data, the Comparative Manifesto Project for reliability comparison with Wordscores and another automated method (CACA). Good overview of method reliability and validity in comparison. | Potential impacts on reliability and validity assessed. Wordscores thoroughly assessed for both strengths and limitations. Low risk of bias. |
Resource efficiency
Documented limitations of the Wordscores method
Study | Text length | Relationship between words, meaning and context | Expertise needed to inform the choice of reference texts | |||||
---|---|---|---|---|---|---|---|---|
Less accurate with short texts | Word focussed | Applicability to complex contexts | Reference texts must fulfil certain conditions | Scores require rescaling | Potential for transformation errors | Requires researcher interference and skills | Score quality of reference texts | |
Baek, Cappella and Bindman* [29] | ☑ | ☑ | ☑ | |||||
Baumann, Debus and Müller [41] | ☑ | |||||||
Bernauer and Bräuninger* [38] | ☑ | ☑ | ||||||
Budge and Pennings* [37] | ☑ | ☑ | ☑ | ☑ | ||||
Coffé and Da Roit [39] | ☑ | |||||||
Costa, Gilmore, Peeters, McKee and Stuckler [28] | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | |
Debus [35] | ☑ | ☑ | ☑ | |||||
Hug and Schulz [40] | ☑ | ☑ | ||||||
Klemmensen, Hobolt and Hansen* [30] | ☑ | ☑ | ☑ | ☑ | ☑ | ☑ | ||
Laver, Benoit and Garry* [25] | ☑ | ☑ | ☑ | ☑ | ||||
Lowe* [31] | ☑ | ☑ | ☑ | ☑ | ||||
Volkens* [36] | ☑ | ☑ | ☑ | ☑ | ☑ |