Machine Learning for Multiple Sclerosis Classification and Disability Prediction using Clinical and MRI Data
Description
This dataset was generated to investigate whether integrating demographic, clinical, and magnetic resonance imaging (MRI) features can improve the classification of multiple sclerosis (MS) patients, distinguish disease phenotypes, and predict disability severity. The underlying research hypothesis is that machine learning (ML) models trained on multimodal data can improve disease characterization more accurately than traditional statistical approaches. The dataset includes data from 1554 patients with MS and 520 healthy controls (HC), collected within the the Italian Neuroimaging Network Initiative. For each participant, demographic information (e.g., age, sex), clinical assessments, and brain MRI scans were acquired. Clinical disability was quantified using the Expanded Disability Status Scale (EDSS) score. MRI acquisition included T2-weighted and 3D T1-weighted sequences, from which quantitative imaging features were derived. These features included total and regional T2 lesion volumes (LV), as well as normalized volumetric measures of cortical and subcortical grey matter (GM), white matter, cerebellum, and brainstem. All imaging-derived variables were preprocessed and harmonized across sites. ML models applied to these data (including support vector machines, multi-layer perceptron networks, Random Forest, and Gradient Boosting) demonstrated high performance in disease and phenotype classification, as well as in EDSS prediction. Specifically, classification accuracy for MS vs HC ranged from 89% to 96%, while phenotype classification reached approximately 92% accuracy. Disability prediction achieved strong agreement with observed EDSS scores (intra-class correlation coefficients between 0.7456 and 0.76). Feature importance analyses (using SHAP values) indicated that T2 LV and regional GM volumes (particularly in the brainstem, cerebellum, thalamus, and cortex) were among the most influential variables for classification and prediction tasks. Demographic variables such as age and sex, along with clinical disability scores, also contribute significantly to model performances. This dataset provides a robust, multimodal resource for advancing ML approaches to MS classification and prognosis.
Files
Categories
Funders
- European Union - Next Generation EU - NRRP M6C2 Investment 2.1Grant ID: PNRR-MAD-2022-12376530, CUP master C43C22001290007
