Published summary

eicu-data-paper/main.tex at master · MIT-LCP/eicu-data-paper

Source github.com/MIT-LCP/eicu-data-paper/blob/master/main.tex Published May 19, 2026

The eICU Collaborative Research Database is a freely available multi-center intensive care unit database with high granularity data for over 200,000 admissions, designed to support machine learning, decision support, and clinical research.

Background and Significance

Intensive care units generate large amounts of monitoring data, but archiving these data for research is challenging due to disparate systems and the need for robust deidentification. The teleICU model, such as Philips' eICU program, provides a centralized platform for remote monitoring and naturally archives comprehensive data.

The Laboratory for Computational Physiology partnered with the eICU Research Institute to create the eICU Collaborative Research Database (eICU-CRD), extending the concept of MIMIC-III from a single hospital to multiple centers. This multi-center design allows researchers to explore variability across institutions and improve generalizability of findings.

Database Structure and Deidentification

The database is distributed as CSV files organized into tables linked by a patient tracking hierarchy: each patient has a unique ID, multiple hospitalizations, and multiple unit stays. All tables are deidentified to HIPAA safe harbor standards, with removal of protected health information, random assignment of identifiers, and deletion of free-text fields containing potential personal data.

The schema is denormalized for independent access, and a certified re-identification risk assessment was performed. The dataset includes 200,859 unit encounters from 139,367 unique patients admitted between 2014 and 2015.

Data Content

Data are organized into categories: administrative (hospital and patient demographics), APACHE IV (physiologic parameters and mortality predictions), care plan (structured communication tools), and care documentation (18 tables covering medications, laboratory results, vital signs, nursing assessments, ventilator settings, and more).

Key tables include periodic vital signs (e.g., heart rate at 5-minute medians), aperiodic vital signs (e.g., non-invasive blood pressure), continuous infusions, intake/output, microbiology, and active treatments. Bedside monitor data from the periodic and aperiodic tables are automatically collected without human verification, unlike nurse-validated flowsheet data.

Access and Usage

Data are available through PhysioNet after completing a CITI human subjects research course and signing a data use agreement. The repository includes documentation, code repositories, and Jupyter Notebooks for example analyses. Future updates will follow semantic versioning, with major changes for schema alterations and minor updates for new tables or corrections.

The authors emphasize that the data were collected for clinical care, not research, so users must be aware of potential inconsistencies. A public issue tracker supports community collaboration and error reporting.

Read this at any depth.

Install Depth and pick your level — Glance for a sentence, Summary for the gist, Read for the full take. Free daily quota, no signup needed.

Add to Chrome
9 views