Input Data

MEDiml package accepts two formats of input data: NIfTI and DICOM. Each format has its own conventions that need to be followed. The following sections describe the norms and the conventions for each format and we recommend you process your dataset in a way that respects them.

DICOM

Image

Every DICOM file contains a header and a body. The header contains the metadata of the image, and the body contains the image itself. The header contains information about the scan and the most important for our package is the following:

Patient ID: Primary identifier for the Patient, referenced in the (0010,0020) PatientID field of the header. This field should not contain any underscore and for compatibility with other MEDomics packages, we recommend using the following format: 'study-institution-numericID'. For example, 'STS-McGill-001'. It is also used in the CSV File of the dataset under the column PatientID.

Series description: Referenced in the (0008,103E) Series Description field of the DICOM header. A description of the series, usually describes the type of the modality used. This field must be renamed to be the same for each sequence of each modality. For example, 'T1' for all the T1-weighted MRI scans and 'T2' for all the T2-weighted MRI scans. It is referred to in the CSV File of the dataset as ImagingScanName.

RTstruct

RTstruct files define the area of significance and hold information about each region of interest (ROI). The RTstruct files are associated with their imaging volume using the (0020,000E) Series Instance UID or the (0020,0052) Frame of Reference UID found in the file’s header. MEDiml package recommends the following:

Patient ID: Same conventions and recommendations as the DICOM image.

Series description: Same conventions and recommendations as the DICOM image.

ROI name: Only found in DICOM RTstruct files and referenced in each element (each ROI) of the (3006,0020) Structure Set ROI Sequence list of the DICOM header, under the attribute (3006,0026) ROI Name which is a name given to each region of interest (ROI). MEDiml has no conventions over this field, but we recommend renaming each ROI name in a simple and logic way to differentiate them from each other. It is very important to keep track of all the ROIs in your dataset since they need to be specified in the CSV File of the dataset under the ROIName column to be used later in your radiomics analysis.

NIfTI

The NIfTI format is a simple format that only contains the image itself. Unlike DICOM, the NIfTI format does contain any information about the regions of interest (ROI) so it needs to be provided in other separate files. In order for MEDMEDimlimage to read a NIfTI scan files, they need to be put in the same folder with the following names:

'PatientID__SeriesDescription(ROILabel).Modality.nii.gz': The image itself. For example: 'STS-McGill-001__T1(GTV).MRscan.nii.gz'.
'PatientID__SeriesDescription(ROIname).ROI.nii.gz': The ROI or the mask of the image. This file should contain a binary mask of the ROI. For example: 'STS-McGill-001__T1(GTV_Mass).ROI.nii.gz'.

The following figure sums up the MEDiml logic in reading data for both formats:

If these conventions are followed, the DataManager class will be able to read the data and create the MEDscan objects that will be used in the radiomics analysis with no further intervention from the user. For instance, MEDiml package is capable of automatically updating the fields of all the DICOM files as long as the dataset is organized in the following way:

dataset_folder
├── Patient ID 1
│   ├── ImagingScanName 1
│   │   ├── DICOM files
│   │   └── ...
│   └── ImagingScanName 2
│       ├── DICOM files
│       └── ...
├── Patient ID 2
│   ├── ImagingScanName 1
│   │   ├── DICOM files
│   │   └── ...
│   └── ImagingScanName 2
│       ├── DICOM files
│       └── ...
└── ...

For example:

dataset_folder
├── STS-McGill-001
│   ├── T1
│   │   ├── *.dcm
│   │   └── ...
│   └── PET
│       ├── *.dcm
│       └── ...
├── STS-McGill-002
│   ├── T2FS
│   │   ├── *.dcm
│   │   └── ...
│   └── CT
│       ├── *.dcm
│       └── ...
└── ...

Just run the following command:

python scripts/process_dataset.py --path-dataset path/to/your/dataset/folder

Note

Future works will include the automatic pre-processing of datasets according to the package conventions.