This study shows that a straight conversion of heart failure search filter search terms into textwords for retrieving PubMed’s unique, non-indexed content would fail to retrieve a proportion (12%) of relevant non-indexed literature prior to MeSH indexing. Five additional terms were identified that strengthened the performance of a supplementary textword-only search strategy for capturing this missed content. Limited to the non-indexed subset of PubMed, this strategy works in conjunction with the validated PubMed heart failure search filter without ever compromising the validated translation’s known level of performance.
The full PubMed heart failure filter
The full PubMed heart failure filter is shown here (supplementary textword-only component for retrieving non-indexed citations in bold):
((heart failure[tw] OR ventricular dysfunction, left[mh:noexp] OR cardiomyopathy[tw] OR left ventricular ejection fraction[tw]) AND Medline[sb]) OR ((heart failure[tiab] OR left ventricular dysfunction[tiab] OR cardiomyopathy[tiab] OR left ventricular ejection fraction[tiab] OR cardiac resynchronization[tiab] OR cardiac failure[tiab] OR left ventricular systolic dysfunction[tiab] OR LV dysfunction[tiab] OR left ventricular diastolic dysfunction[tiab]) NOT Medline[sb]).
Only the highest frequency terms were shortlisted for testing in the development stage, whereupon redundant terms (those that could not retrieve Lost Set citations in addition to preceding terms) were eliminated. This process ensures that searchers are presented with a contained, rather than comprehensive, set of unique PubMed citations by favouring search precision over search sensitivity (the proportion of all relevant citations retrieved) within the non-indexed subset of PubMed.
This filter has been made available as a hypertext link on the CareSearch website (
http://www.caresearch.com.au) to facilitate automated access to the relevant heart failure literature. To enhance its clinical utility, it has also been combined with 39 expert searches on a range of heart failure subjects such as anaemia, renal insufficiency, cognitive impairment, device deactivation, and self-care [
14].
Methodology assessment
When translating between databases, inherent differences in database structure, syntax and search algorithms need to be understood for optimal retrieval. The existence within PubMed of unique content in addition to Medline content provides a case in point. This study described a methodology for assessing the effect of forcing search filter indexing terms to search within a subset that does not include indexing terms.
Frequency analysis proved a useful and objective strategy for identifying natural language terms that regularly co-occur with MeSH-based filter terms. These alternative terms were then assessed for their ability to retrieve Lost Set citations. Only a small number of these terms occurred with relatively high frequency across the entire Lost Set to warrant consideration for inclusion in the supplementary textword-only search strategy. Even then, these terms combined could only capture 18.8% of the lost citations. This finding highlights the diffuse nature of natural language and the value of controlled vocabulary indexing to database searchers. The fact that many citations in the Lost Set were indexed with heart failure but did not include this term in either title or abstract indicates that indexers, with their specialist clinical knowledge and access to full text articles, clearly see beyond terms in titles and abstracts when assigning MeSH terms. Indexing can therefore be seen as a value-added process for improving the retrievability of relevant items.
This study has several potential limitations. Firstly, we chose to exclude citations without abstracts from the analysis. This decision was based on the assumption that an imbalance between the number of title words and number of abstract words could skew the word pool for frequency analysis. Furthermore, search terms of high discriminatory power, beyond those already included in the filter may be more likely to occur in the substantive abstract field than the shorter title field. While it was beyond the scope of this study to investigate the significance of this decision, this methodological issue remains unresolved in the area of filter development. Secondly, the cut-off point of 5% for identifying ‘high frequency’ terms was chosen arbitrarily. Whilst it appears reasonable, it may have inadvertently eliminated some highly specific natural language alternatives.
The purpose of this study was to explicitly acknowledge PubMed’s unique content and provide a systematic, reproducible method for accounting for this content in translating a search filter from OvidSP Medline. It was not our aim to develop an additional high sensitivity/recall ‘search filter’ for capturing this content, rather an empirically tested extension of an already validated filter which works in tandem with this filter to incrementally improve retrieval across the entire PubMed system. Although it was outside the scope of this study, a future study might extend the methodology to ‘validating’ this additional component in the traditional search filter development sense, using a ‘gold standard’ set of relevant and non-relevant citations. This approach would make it possible to provide the standard metrics of search performance such as sensitivity, specificity, and precision.
A future study might also investigate the effect of including search statements that incorporate the AND Boolean operator in order to increase retrieval in the Lost Set. We only included phrase constructs in our search strategy which imposes an adjacency condition on search terms, e.g. ‘left ventricular systolic dysfunction’ or ‘LV dysfunction’. The AND operator might serve to broaden the search without too great a cost to search precision, e.g. Left AND (systolic OR diastolic OR LV) AND ventricular AND dysfunction. The use of truncation may further improve retrieval (e.g. ventric* retrieves on ventricular, ventricle, and ventricles).
Although this present study focuses on the technical aspects of filter translation, it may have benefited from greater clinician input, particularly in the area of natural language term selection. Natural language terms were tested based on a numerical measure of their importance (frequency) rather than a clinical judgement of their significance to the topic of heart failure. Furthermore, introducing additional terms into a search can increase the risk of retrieving irrelevant citations. A formal post-hoc assessment of the relevance of the citations retrieved by each textword in the supplementary search strategy may have provided further information on their suitability for inclusion.
Notwithstanding the above, this research has demonstrated that whilst an OvidSP Medline to PubMed search filter translation may provide equivalent retrieval of indexed articles, retrieval in PubMed can be extended to non-indexed articles. An additional textword-only version of the search filter, developed to retrieve PubMed’s unique content, can be combined with the translated version to create a PubMed search that ‘filters’ the entire PubMed system, and not just a subset thereof, to focus on a topic of interest. The final result then offers searchers appealing benefits such as ease of access, timeliness of citations, and more extensive coverage.