A Multi-Dataset Analysis and Dynamic Risk Stratification
Martina Falasca¹ · Agostino Aiezzo² · Silvia Angeletti³ · Letizia Chiodo⁶ · Simonetta Filippi⁶
Mario Garofano⁴ · Alessandro Loppini⁶ · Maria Vittoria Ristori³ · Adolfo Santoro² · Silvia Spoto⁵ · Silvia Scarpetta¹
Sepsis remains a time-critical condition whose prognosis deteriorates markedly with each hour of diagnostic delay, underscoring the imperative for robust early-warning systems capable of identifying prodromal physiological derangements. In the present work, we introduce an integrated computational framework that synthesises exploratory phenotypic analysis with temporally-aware deep learning and permutation-based interpretability methods, yielding a comprehensive toolkit for early detection and dynamic risk stratification. The framework is validated across two complementary cohorts: SepsisExp, a purpose-collected dataset with expert-adjudicated ground-truth labels, and MIMIC-III, the de-facto benchmark for retrospective intensive care research. The phenotypic exploration phase employs rigorous interquartile-range-based outlier filtration, bivariate and partial correlation mapping, and ts-KMeans clustering to delineate heterogeneous clinical trajectories preceding sepsis onset. Building upon this characterisation, Long Short-Term Memory (LSTM) architectures are trained on multivariate time-series windows spanning the 24 hours prior to onset, with the final h hours systematically excluded to simulate realistic deployment latency across prediction horizons h ∈ {1, 2, 3, 6, 9, 12}. To bridge the interpretability gap between high-capacity neural models and bedside decision-making, we apply PermFit, a permutation-based feature-importance algorithm which quantifies each biomarker's marginal contribution by measuring degradation in predictive performance under random shuffling. The resulting importance trajectories reveal a clinically coherent temporal shift: haemodynamic and respiratory variables dominate predictions at short horizons, while biochemical and inflammatory indices acquire prominence as the prediction window extends. By rendering complex model behaviour transparent and clinically interpretable, the proposed methodology establishes a principled bridge between high-performance computation and actionable decision support in time-critical settings.
Interquartile-range-based outlier analysis revealed substantial dispersion across respiratory, metabolic, inflammatory, and tissue-perfusion biomarkers. Bivariate correlation mapping identified a robust positive association between minute ventilation and respiratory rate (r = 0.72), and a moderate inverse relationship between heart rate and stroke volume (r = −0.28). Clustering experiments employing K-Means, DBSCAN, and agglomerative hierarchical methods demonstrated more sharply delineated septic versus non-septic partitions in binary formulations relative to multi-class severity-stratified groupings.
A Long Short-Term Memory architecture processes multivariate time-series clinical data drawn from the 24-hour window preceding sepsis onset. The model is evaluated across six prediction horizons (h ∈ {1, 2, 3, 6, 9, 12} hours before onset), with the terminal h hours systematically excluded to emulate realistic clinical deployment constraints. Training was performed on both SepsisExp and MIMIC-III under stratified k-fold cross-validation, with hyperparameters optimised via grid search over learning rate, hidden-unit dimensionality, and dropout regularisation strength.
PermFit assesses the marginal contribution of each input variable by quantifying the degradation in a pre-specified performance metric following random permutation of that feature's values, thereby breaking its association with the target while preserving the marginal distribution. The resulting importance trajectories exhibit a clinically coherent temporal progression: haemodynamic and respiratory variables dominate at prediction horizons proximal to onset (h = 1–3), whereas biochemical and inflammatory markers, consistent with their slower response kinetics, acquire greater relative importance at extended horizons (h = 6–12).
"Haemodynamic signals speak first.
Biochemistry answers last."
As the prediction window extends from 1 to 12 hours before sepsis onset, the dominant biomarker class undergoes a systematic transition from real-time physiological signals (heart rate, respiratory rate, blood pressure, and oxygen saturation) to slower-responding inflammatory and metabolic markers, including white blood cell count, lactate, creatinine, bilirubin, and C-reactive protein. This temporal progression is consistent with the established pathophysiology of septic deterioration, wherein haemodynamic decompensation precedes the full elaboration of the systemic inflammatory response.
Expert-annotated clinical dataset with gold-standard sepsis labels derived through adjudicated physician review. Serves as the primary training and validation corpus. Comprises multivariate time-series observations spanning 24-hour pre-onset windows across multiple severity classes.
Medical Information Mart for Intensive Care III, the de-facto open-access benchmark for ICU clinical research, encompassing de-identified health data from approximately 40,000 critical care admissions. Employed for cross-dataset validation to assess the generalisability of the framework beyond the primary training distribution.
The authors gratefully acknowledge the support of the Italian National Recovery and Resilience Plan (NRRP), Mission 4, Component 2, funded by the European Union, NextGenerationEU, through the Research Programme "National Centre for HPC, Big Data and Quantum Computing", Project CN00000013, Spoke 6 (Cascade Funding Spoke 6, Project "PRISM, Predictive Risk Identification of Sepsis with Machine Learning").