Background
Methods
Identifying components for reproducible research
-
Hypothetically, if we conduct a methodological or meta-analytic review of the reproducibility of current practices within biomedical sciences, what information do we need to gather from the literature?
-
How do the broad steps across the research life cycle gathered through current reproducibility research in other fields scale and manifest within the biomedical sciences?
Empirical reproducibility framework development for biomedical research
Axes of Research Reproducibility | Example | Categories |
---|---|---|
Transparency is the robust write up or description of research, such that it is clear and explicit. | All data collection processes are described clearly within publication methods and metadata. | data collection, data cleaning/preparation, data integration, data analysis, data sharing, code (cleaning, integration, analysis), data, software, documentation |
Accessibility is a multi-faceted term encompassing both sharing and discoverability. Shared information such as a research dataset or analysis code must be discoverable, in a form that people can use, and available. Discoverability is defined as being in a location that enables the finding of the data and supplemental materials. | A query script used in data collection procedures is shared in a freely accessible and easily discoverable database. |
Testing the face validity of framework items
Internal review of framework items
Identification of published studies for testing
Inter-rater reliability
Results
Components for reproducible research
Face validity of framework variables
RepeAT Framework Variable | Cohen’s Kappa | Kappa Bounds | var Rater 1 | var Rater 2 | Percent Agreement |
---|---|---|---|---|---|
Publication state database(s) source(s) of data? | 0.320 | (0.580–0.060) | 0.095 | 0.250 | 70.6 |
Does the publication clearly state process(es) for validating data minded via nlp and/or queried from a database? | 0.440 | (0.860–0.019) | 0.182 | 0.069 | 85.7 |
Does the author state any clear process documented for accounting for missing data? | 0.520 | (0.890–0.140) | 0.115 | 0.261 | 83.3 |
Does the research involve natural language processing or text mining? | 0.870 | (1.100–0.630) | 0.134 | 0.107 | 97.1 |
Does the author indicate the software used to develop the analysis code? | 0.880 | (1.000–0.710) | 0.236 | 0.243 | 94.1 |
Reproducibility framework for biomedical research
Reproducibility Category | Major Concepts |
---|---|
Research Design and Aim | Recording administrative and study information |
Database and Data Collection Methods | Clarifying study data source(s) and methods of collection |
Data Mining and Data Cleaning | Describing process for cleaning, merging, and validating data |
Data Analysis | Clarifying methods and materials for data analysis |
Data Sharing and Documentation | Making relevant research data and documentation shared, accessible, and intelligible |
Publication Overview and Bibliographic Information (21 items) | |
Article Title | Text |
DOI | Text |
Is the research hypothesis-driven or hypothesis-generating? | Hypothesis Driven Hypothesis Generating Unclear |
Database and Data Collection (63 items) | |
Publication states database(s) source(s) of data? | Yes/No |
bPublication states database(s) source(s) of data in the following location: | Not Stated Supplementary materials Body of Text |
Query methodology | Manual extraction Digital extraction through query interface Digital extraction through honest broker Not Applicable/Not Stated |
bDoes the shared query script for database contain comments and/or notations for ease of reproducibility? | Yes/No |
Methods: Data Mining and Cleaning (19 items) | |
Does the research involve natural language processing or text mining? | Yes/No |
bPlease list all software applications used for text mining:
Please enter all that apply separated by a semi-colon
| Text |
bIs the text mining software application proprietary or open?
If multiple applications were used, please select all options that apply.
| 1. Proprietary 2. Mixed 3. Open |
Methods: Data Analysis (15 items) | |
Does the author state analysis methodology and process? | Yes/No |
Does the author indicate the software used to develop the analysis code? | Yes/No |
bIs the analysis software proprietary or open? | Proprietary Open |
Data Sharing and Data Documentation (36 items) | |
Is the finalized dataset shared? | Yes No |
bWhere is the finalized dataset shared? | Affiliated Research Center Website Author’s Institution or Department Website Data Registry Journal or Publication’s Website GitHub Other |
Is there a clear process for requesting the data? | Yes No |