The eICU Collaborative Research Database is a freely available multi-center intensive care unit database with high granularity data for over 200,000 admissions, designed to support machine learning, decision support, and clinical research.
- The eICU Collaborative Research Database (eICU-CRD) contains deidentified data from over 200,000 ICU admissions across 208 US hospitals, including vital signs, diagnoses, and treatments.
- Data were sourced from the Philips eICU teleICU program and archived via the eICU Research Institute, then processed by MIT's Laboratory for Computational Physiology.
- The database builds upon MIMIC-III by providing multi-center data, enabling studies of practice variation and generalizability that single-center data cannot support.
- APACHE IV severity scores and mortality predictions are included for most patients, derived from physiologic measurements and comorbidities documented in the first 24 hours.
- Access requires completion of a human subjects research training course and signing a data use agreement that prohibits reidentification and mandates code sharing for publications.
Background and Significance
Intensive care units generate large amounts of monitoring data, but archiving these data for research is challenging due to disparate systems and the need for robust deidentification. The teleICU model, such as Philips' eICU program, provides a centralized platform for remote monitoring and naturally archives comprehensive data.
The Laboratory for Computational Physiology partnered with the eICU Research Institute to create the eICU Collaborative Research Database (eICU-CRD), extending the concept of MIMIC-III from a single hospital to multiple centers. This multi-center design allows researchers to explore variability across institutions and improve generalizability of findings.
Database Structure and Deidentification
The database is distributed as CSV files organized into tables linked by a patient tracking hierarchy: each patient has a unique ID, multiple hospitalizations, and multiple unit stays. All tables are deidentified to HIPAA safe harbor standards, with removal of protected health information, random assignment of identifiers, and deletion of free-text fields containing potential personal data.
The schema is denormalized for independent access, and a certified re-identification risk assessment was performed. The dataset includes 200,859 unit encounters from 139,367 unique patients admitted between 2014 and 2015.
Data Content
Data are organized into categories: administrative (hospital and patient demographics), APACHE IV (physiologic parameters and mortality predictions), care plan (structured communication tools), and care documentation (18 tables covering medications, laboratory results, vital signs, nursing assessments, ventilator settings, and more).
Key tables include periodic vital signs (e.g., heart rate at 5-minute medians), aperiodic vital signs (e.g., non-invasive blood pressure), continuous infusions, intake/output, microbiology, and active treatments. Bedside monitor data from the periodic and aperiodic tables are automatically collected without human verification, unlike nurse-validated flowsheet data.
Access and Usage
Data are available through PhysioNet after completing a CITI human subjects research course and signing a data use agreement. The repository includes documentation, code repositories, and Jupyter Notebooks for example analyses. Future updates will follow semantic versioning, with major changes for schema alterations and minor updates for new tables or corrections.
The authors emphasize that the data were collected for clinical care, not research, so users must be aware of potential inconsistencies. A public issue tracker supports community collaboration and error reporting.
Read this at any depth.
Install Depth and pick your level — Glance for a sentence, Summary for the gist, Read for the full take. Free daily quota, no signup needed.
Add to Chrome