Introduction
What is big in big data?
Other unique features of big data
Big data in general medicine and psychiatry
Description | Primary finding | Number of subjects (n) | Data source | References |
---|---|---|---|---|
Create actuarial suicide risk algorithm to predict suicide in the 12 months after inpatient hospitalization for psychiatric disorder | 52.9 % of posthospitalization suicides occurred after the 5 % of hospitalizations with the highest predicted suicide risk | 40,820 soldiers hospitalized for psychiatric disorders. 421 predictors | 38 army and DOD administrative data | Kessler et al. (2015) |
Explore prevalence of substance use disorders (SUD) among psychiatric patients in large university system | 24.9 % of patients had SUD; SUD associated with more inpatient and emergency care | 40,999 psychiatric patients aged 18–64 years who sought treatment between 2000 and 2010 | EMR-based psychiatry registry | Wu et al. (2013) |
Ongoing study of cognitive impairment using neuroimaging and genetics | Neuroimaging phenotypes were significantly associated with progression of dementia | 808 patients over age 65, including 200 with Alzheimer’s disease | 20 derived neuroimaging markers plus 20 SNPs | Weiner et al. (2012) |
Examine use of psychotropic drugs by patients without psychiatric diagnosis | 58 % of those prescribed a psychiatric medication in 2009 had no psychiatric diagnosis | 5,132,789 individuals who received prescription for psychotropic medication | Private medication claims database | Wiechers et al. (2013) |
Analyze prescribing of psychotropic drugs by specialty | 59 % written by general practitioners, 23 % by psychiatrists, 17 % by other physicians and providers | 472 million prescriptions for psychotropic drugs | IMS database of 70 % of US retail pharmacy transactions for 2006–2007 | Mark et al. (2009) |
Compare risk of dementia in those 55 or older having traumatic (TBI) brain injury versus non-TBI trauma (NTT) | TBI increased risk for dementia over NTT | 51,799 patients with trauma, of which 31.5 % had TBI | CA statewide administrative health database of ER and inpatient visits | Gardner et al. (2014) |
Use machine learning to predict suicidal behavior text in EMR | Model obtained high specificity but low sensitivity, with PPV of 41 % | 250,000 US veterans of Gulf War | Clinical records | Ben-Ari and Hammond (1991) |
Investigate association between maternal and paternal age and risk of autism | Both increasing maternal age and increasing paternal age were independently associated with increased risk of autism | 7,550,026 single births in CA 1989–2002. 23,311 with autism | Developmental services administrative data, birth certificate data | Grether et al. (2009) |
Use natural language processing (NLP) to classify current mood state to identify treatment resistant depression | NLP models better than those relying on billing data alone | 127,504 patients with diagnosis of major depression | EMR and billing data from outpatient psychiatry practices affiliated with large hospital | Perlis et al. (2012) |
Analyze impact of Medicaid prior authorization for atypical antipsychotics on prevalence of schizophrenia among prison inmates | Prior authorization associated with greater prevalence of mental illness in inmates | 16,844 inmates | Nationally representative sample from Census Bureau | Goldman et al. (2014) |
Investigate incidence of severe psychiatric disorders following hospital contact for head injury | Increased risk of schizophrenia, depression, bipolar disorder and organic mental disorders following head injuries | 113,906 people who had suffered head injuries, and were born between 1977 and 2000 | Danish psychiatric central register | Orlovska et al. (2014) |
Integrate depression screening, prescription fulfillment and EMR to improve care in primary care (PC) | Integration improved diagnosis and management of depression in PC | 61,464 patients in PC in 14 clinical organizations | EMR, plus 4900 PHQ-9 questionnaires, plus fulfillment data for 55 % of patients | Valuck et al. (2012) |
Analyze if SSRI/SNRI use prior to admission to ICU increased mortality risk | Increased hospital morality among those in ICU taking SSRI/SNRI before admission | 14,709 patients with 2471 taking SSRI/SNRI | Multiparameter Intelligent Monitoring in Intensive Care database (data from EMR) | Ghassemi et al. (2014) |
Evaluate safety of antipsychotic (AP) medication use in nursing homes | Dose-dependent increased risks of serious medical events such as myocardial infarction, stroke, infection, hip fracture, within 180 days of initiating AP treatment | 83,959 Medicaid eligible residents ≥age 65 who initiated AP use after nursing home admission | Medicare and Medicaid claims from 45 states | Huybrechts et al. (2012) |
Evaluate use of EMR to assist with phenotyping in bipolar disorder (BP) | Semiautomated data mining of EHR may assist with phenotyping of patients and controls | 52,235 patients with at least one diagnosis of BP or mania, spanning 20 years | EMR, billing and inpatient pharmacy data | Castro et al. (2015) |
Quality issues with big data
Analytical challenges for big data
Study description | Issue | Errors found | Patient source | References |
---|---|---|---|---|
Examine relationship between illness severity and quantity of data in EMR | Data sufficiency | Setting minimal data requirements for inclusion in a study cohort created bias toward selection of sicker patients | EMR records from 10,000 patients who received anesthetic services | Rusanov et al. (2014) |
Investigate patterns in lab tests for potential impact on use in modeling EMR data | Context for interpreting lab tests results | Frequency of lab tests confounded by scheduled visits, such as every 3 months | EMR records from 14,141 patients | Pivovarov et al. (2014) |
Repeat prior study of pneumonia severity index to demonstrate bias in EMR retrospective research | (a) Diagnostic consistency | Adding constraints to improve consistency of diagnostic cohort significantly changed the sample (decreased the size) | EMR records from 46,642 patients with indication of pneumonia | Hripcsak et al. (2011) |
(b) Small number of cases can have large impact on outcome | Very sick patients who die quickly in ER will not have symptoms entered into EMR, impacting mortality rates | |||
Investigate concordance of diagnosis of PTSD in EMR with diagnosis determined by SCID interview | Diagnostic accuracy | Over 25 % of EMR diagnoses in veterans were incorrect for PTSD. Those with least and most severe symptoms most likely to be accurate | Sample of 1649 veterans | Holowka et al. (2014) |
Evaluate diagnosis of schizophrenia in EMR compared with chart review by psychiatrist | Diagnostic accuracy | Prevalence of schizophrenia was 14 % by coding, dropping to 1.8 % with manual review. Coding most accurate (74 %) for those with four or more coding labels | 819 veterans in a pain clinic | Jasser et al. (2007) |
Review whether written informed consent introduces selection bias in prospective observational studies using data from EMR | Written informed consent | Significant differences between participants and non-participants with inconsistent direction of effect | Review of 1650 citations. 17 studies included with 69 % of 161,604 eligible patients giving consent | Kho et al. (2009) |
Analyze if underlying health of seniors impacts risk reduction for death and hospitalization associated with influenza vaccine | Selective prescribing of preventative measures | Greatest reduction in risk occurs before influenza season, indicating preferential receipt of vaccine by healthy seniors | 72,527 people ≥65 years not residing in nursing homes, using plan administrative data | Jackson et al. (2006) |
Investigate surprising protective effects attributed to preventative medications by examining association between statin use and motor vehicle and workplace accidents | Healthy-adherer bias (adherent patients more health seeking) | Statin users significantly less likely to be involved in motor vehicle and workplace accidents. Example of unmeasurable confounding in dataset | 141,086 patients taking statins for prevention | Dormuth et al. (2009) |
Passive case-finding for Alzheimer’s disease and dementia using medical records | Research center population not generalizable | Research center population younger, more severe disease, more educated than general population | 5233 patients over age 70 | Knopman et al. (2011) |
Explore selection bias when comparing outcomes from cancer therapy using observational data in SEER database | Severity of illness, self-rated health, comorbidities | Improbable results. Adjustment techniques such as propensity scores insufficient. Some outcome measures caused by treatments | 53,952 patients with prostate cancer in three therapy groups | Giordano et al. (2008) |