Getting started

Before installing AUDIT on your local computer, we recommend checking out the publicly deployed app on the Streamlit Cloud server at https://auditapp.streamlit.app/.

Users can familiarize themselves with AUDIT through this tool without needing to install anything. When users are ready to make extensive use of it and analyze their own datasets, they will need to follow the steps outlined below.

1. Installation

The installation of AUDIT can be done in different ways depending on the user's needs. For users who plan to use AUDIT regularly in their projects, we recommend the "For Developers" option, as it allows for easy extension of the library with new functionalities. For users who do not want to have all the code on their local machine, there is the "For Standard Users" option.

The "For Developers" option is not necessarily more complex and, in fact, is the one recommended by the authors.

1.1 For standard users - Using PIP

Install the latest available AUDIT version directly from PyPI (when available) through the following command:

pip install auditapp

This is the simplest method if you just want to use the library without modifying the source code. However, configuration files and project structure still need to be set up.

1.2. For developers - Using AUDIT repository

For development or if you need access to the latest updates, install AUDIT from our repository. If you do not use Poetry for dependency management, you should choose option 1.2.1. However, if you are familiar with Poetry, select option 1.2.2.

1.2.1. Without using Poetry for dependency management

Create an isolated environment (recommended for avoiding dependency conflicts):
```
conda create -n audit_env python=3.10
conda activate audit_env
```

Clone the repository:

git clone git@github.com:caumente/AUDIT.git
cd AUDIT

Install the required dependencies:
```
pip install -r requirements.txt
```

1.2.2. Using Poetry for dependency management

Poetry is a dependency manager that simplifies library management and environment creation. Follow these steps:

Ensure Poetry is installed in your environment.
Clone AUDIT's repository:

git clone git@github.com:caumente/AUDIT.git
cd AUDIT

Install the dependencies:

poetry install

Activate the virtual environment:

poetry shell

1.3. Conclusion

The authors recommend following the developer option, as it provides greater flexibility for users once they are familiar with the use of AUDIT. Additionally, our repository includes example cases, a specific project structure, and certain outputs. This simplifies interaction with the AUDIT app right from the start of its use.

2. Project structure and guidelines

AUDIT supports various project structures, but it is highly recommended to adhere to the default structure for clarity, ease of use, and ensuring correct functionality. This structure is intuitive, modular, and designed to facilitate working with datasets, configurations, outputs, and logs.

We will guide you through the recommended structure and explain the purpose of each directory in the context of the AUDIT Python library.

2.1.Datasets directory (`/datasets`)

The datasets directory is the cornerstone of the project, containing all datasets used for both training and testing models. Each dataset is organized into subdirectories, making it straightforward to store images, ground truth segmentations, and predictions generated by different models.

your_project/
├── datasets/
│   ├── dataset_1/
│   │   ├── dataset_1_images/
│   │   │   ├── dataset_1_case_1/
│   │   │   │   ├── dataset_1_case_1_t1.nii.gz
│   │   │   │   ├── dataset_1_case_1_t1ce.nii.gz
│   │   │   │   ├── dataset_1_case_1_t2.nii.gz
│   │   │   │   ├── dataset_1_case_1_flair.nii.gz
│   │   │   │   ├── dataset_1_case_1_seg.nii.gz
│   │   │   ├── dataset_1_case_2/
│   │   │   ......
│   │   ├── dataset_1_seg/
│   │   │   ├── model_1/
│   │   │   │   ├── dataset_1_case_1/
│   │   │   │   │   ├── dataset_1_case_1_pred.nii.gz
│   │   │   .....
│   │   │   ├── model_2/
│   │   │   │   ├── dataset_1_case_1/
│   │   │   │   │   ├── dataset_1_case_1_pred.nii.gz
│   │   │   .....
│   ├── dataset_2/
...

Explanation of Components

dataset_1/: Each dataset is stored in its own directory (e.g., dataset_1, dataset_2, etc.).
dataset_1_images/: This folder contains all image data for the dataset. Subfolders represent individual cases (e.g., dataset_1_case_1), and each case includes its respective sequences, such as T1, T1ce, T2, FLAIR, and segmentation (ground truth).
dataset_1_seg/: This directory stores predictions made on the dataset by different models. Each model has its own subdirectory (e.g., model_1, model_2), and within each, predictions for every case are organized similarly to the ground truth images. The extension __pred_ is a reserved word in AUDIT library for model predictions.

This design supports multi-center and multi-model comparisons by storing predictions from several models alongside the original data. Please, find an example of this folder in the following link: dummy dataset folder.

2.2. Configuration directory (`config/`)

The config directory houses all configuration files required to run feature and metric extraction as well as the web app.

your_project/
├── config/
│   ├── feature_extraction.yaml
│   ├── metric_extraction.yaml
│   ├── app.yaml

Each of the config files are carefully described in section 3. Configuration. Please, find an example of each of the config files in the following link: dummy config files.

2.3. Outputs directory (`outputs/`)

The outputs directory is where all results generated by the project are stored. This includes extracted features and metrics calculated during model evaluation.

your_project/
├── outputs/
│   ├── features/
│   ├── metrics/

Explanation of Components - features/: Stores the features extracted from the dataset during processing. These could include case-specific features or summary statistics. - metrics/: Contains evaluation results, such as segmentation metrics (e.g., Dice scores) or model performance comparisons.

2.4. Logs Directory (`logs/`)

The logs directory contains logs generated during project execution. These logs are critical for debugging, monitoring progress, and keeping a historical record of runs.

your_project/
├── logs/
│   ├── features/
│   ├── metric/

Explanation of Components - features/: Logs related to feature extraction, such as execution times, errors encountered during processing, or debugging information. - metric/: Logs generated during metric evaluation, including potential issues with predictions.

2.5. Conclusion

By following this recommended structure, you can ensure seamless integration with the AUDIT Python library, enabling efficient data management, model evaluation, and reproducibility. This structure is both modular and extensible, making it adaptable to projects of varying complexity.

In addition, to configure this project structure, especially the datasets directory, AUDIT provides methods for the organization and standardization of datasets.

The outputs/ and logs/ directories will be automatically created the first time AUDIT is executed, so there is no need to create them explicitly.

Users should note that, as mentioned in Section 1, using the developer option does not require creating the project structure manually, as it is already implemented by default.

3. Configuration

AUDIT uses configuration files to define paths, settings, and parameters. These files are located in the audit/configs/ directory. However, if the user is not following the developer configuration both files must be created.

Feature extraction: Configure MRI features and datasets in feature_extraction.yml.
Metric extraction: Define evaluation metrics and paths in metric_extraction.yml.
App settings: Customize web app options in app.yml.

Make sure to adjust other paths and settings according to your environment. Please, find an example of each of the config files in the following link: dummy config files.

3.1. Example of feature extraction config file

This configuration file is used to define the settings for feature extraction in the AUDIT library. Each key and its usage is explained below:

data_paths: Specifies the paths to the directories containing MRI datasets. It is a dictionary where each key represents a dataset name (e.g., BraTS, UCSF), and the value is the file path to the dataset folder.
labels: Maps region names (e.g., tumor labels) to their numeric values. This mapping is used to identify regions in segmentation maps.
- BKG: Background (non-tumor regions).
- EDE: Edema.
- ENH: Enhancing tumor.
- NEC: Necrotic tumor tissue.
features: Lists the types of features to be extracted from the MRI datasets. Each key is a feature type, and its value (true or false) enables or disables that feature. Key Options:
- statistical: Extract basic statistical properties (e.g., mean, variance) of the MRI intensity values.
- texture: Compute texture-based features (e.g., entropy, contrast).
- spatial: Analyze spatial properties (e.g., brain location, spatial resolution).
- tumor: Extract tumor-specific features (e.g., tumor volume, tumor location).
longitudinal: Configures settings for longitudinal studies, allowing analysis of changes in subjects over time. Each dataset can have a unique configuration.
- pattern: A delimiter (e.g., _, -, or /) used to split filenames.
- longitudinal_id: The position (0-based index) in the split filename where the subject ID is located.
- time_point: The position (0-based index) in the split filename indicating the time point (e.g., pre-treatment, post-treatment).
output_path: Specifies the directory where extracted features will be saved.

Below is a complete configuration file, demonstrating how these keys are used together:

# Paths to all your datasets
data_paths:
  BraTS: '/home/user/AUDIT/datasets/BraTS/BraTS_images'
  UCSF: '/home/user/AUDIT/datasets/UCSF/UCSF_images'

# Mapping of labels to their numeric values
labels:
  BKG: 0
  EDE: 3
  ENH: 1
  NEC: 2

# List of features to extract
features:
  statistical: true
  texture: false
  spatial: true
  tumor: true

# Longitudinal settings (if longitudinal data is available)
longitudinal:
  UCSF:
    pattern: "_"            # Pattern used for splitting filename
    longitudinal_id: 1      # Index position for the subject ID after splitting the filename
    time_point: 2           # Index position for the time point after splitting the filename


# Path where extracted features will be saved
output_path: '/home/user/AUDIT/outputs/features'

3.2. Example of metric extraction config file

This configuration file is used to define the settings for feature extraction in the AUDIT library. Each key and its usage is explained below:

data_paths: Specifies the paths to the directories containing MRI datasets. It is a dictionary where each key represents a dataset name (e.g., BraTS, UCSF), and the value is the file path to the dataset folder.
model_predictions_paths: Specifies the paths to the directories containing model predictions. It is a dictionary where each key represents a model name (e.g., nn-UNet, SegResNet), and the value is the corresponding path.
labels: Maps region names (e.g., tumor labels) to their numeric values. This mapping is used to identify regions in segmentation maps.
- BKG: Background (non-tumor regions).
- EDE: Edema.
- ENH: Enhancing tumor.
- NEC: Necrotic tumor tissue.
metrics: Lists the metrics to be computed to evaluate the model predictions. Each key represents a metric, and its value (true or false) enables or disables the computation of that metric. Available options include: dice, jacc, accu, prec, sens, spec, haus, size.
package: Specifies the library used to compute the metrics. AUDIT will be used by default.
calculate_stats: A flag that determines whether additional statistical information (e.g., mean, variance) is computed for the evaluation. Only available is using pymia library.
output_path: Specifies the directory where the computed metrics will be saved after evaluation.
filename: Defines the filename prefix for saving the output metrics.

Below is a complete configuration file, demonstrating how these keys are used together:

# Path to the raw dataset
data_path: '/home/user/AUDIT/datasets/BraTS/BraTS_images'

# Paths to model predictions
model_predictions_paths:
  nnUnet: '/home/user/AUDIT/datasets/BraTS/BraTS_seg/nnUnet'

# Mapping of labels to their numeric values
labels:
  BKG: 0
  EDE: 3
  ENH: 1
  NEC: 2

# List of metrics to compute
metrics:
  dice: true
  jacc: true
  accu: true
  prec: true
  sens: true
  spec: true
  haus: true
  size: true

# Library used for computing all the metrics
package: custom
calculate_stats: false

# Path where output metrics will be saved
output_path: '/home/user/AUDIT/outputs/metrics'

# Filename for the extracted information
filename: 'BraTS'

3.3. Example of APP web config file

This configuration file is used to define the settings for organizing datasets, feature extractions, and model predictions in the AUDIT library. Each key is explained below:

labels: Maps region names (e.g., tumor labels) to their numeric values. This mapping is used to identify regions in segmentation maps.
- BKG: Background (non-tumor regions).
- EDE: Edema.
- ENH: Enhancing tumor.
- NEC: Necrotic tumor tissue.
features_path: Defines the root path where the feature extraction results are saved.
metrics_path: Defines the root path where the metric extraction results are saved.
raw_datasets: Specifies paths to directories containing the raw MRI datasets. Each key represents a dataset name (e.g., BraTS, UCSF), and the value is the file path to the respective dataset folder.
features: Specifies paths to CSV files where the extracted feature information is saved for each dataset. Each key represents a dataset name, and the value is the file path to the corresponding feature extraction CSV file.
metrics: Specifies paths to CSV files where metric extraction information is saved for each dataset. Similar to the features section, each key represents a dataset name, and the value is the path to the corresponding metrics CSV file.
predictions: Specifies the paths for model predictions for different datasets. Each dataset name (e.g., BraTS_SSA) maps to a dictionary containing model names (e.g., nnUnet, SegResNet) as keys, and the file paths to their respective segmentation predictions as values.

# Mapping of labels to their numeric values
labels:
  BKG: 0
  EDE: 3
  ENH: 1
  NEC: 2

# Root path for datasets, features extracted, and metrics extracted
datasets_path: '/home/user/AUDIT/datasets'
features_path: '/home/user/AUDIT/outputs/features'
metrics_path: '/home/user/AUDIT/outputs/metrics'

# Paths for raw datasets
raw_datasets:
  BraTS: "${datasets_path}/BraTS/BraTS_images"
  BraTS_SSA: "${datasets_path}/BraTS_SSA/BraTS_SSA_images"
  UCSF: "${datasets_path}/UCSF/UCSF_images"

# Paths for feature extraction CSV files
features:
  BraTS: "${features_path}/extracted_information_BraTS.csv"
  BraTS_SSA: "${features_path}/extracted_information_BraTS_SSA.csv"
  UCSF: "${features_path}/extracted_information_UCSF.csv"

# Paths for metric extraction CSV files
metrics:
  BraTS_SSA: "${metrics_path}/extracted_information_BraTS.csv"
  UCSF: "${metrics_path}/extracted_information_UCSF.csv"

# Paths for models predictions
predictions:
  BraTS_SSA:
    nnUnet: "${datasets_path}/BraTS_SSA/BraTS_SSA_seg/nnUnet"
    SegResNet: "${datasets_path}/BraTS_SSA/BraTS_SSA_seg/SegResNet"
  UCSF:
    SegResNet: "${datasets_path}/UCSF/UCSF_seg/SegResNet"

4. Run AUDIT Backend

The backend of AUDIT is responsible for calculating the metrics specified in configuration files and extracting features from magnetic resonance imaging (MRI) data. Depending on the installation method—either via the AUDIT repository (for developers) or through pip (for standard usage)—the library is designed to execute these processes via command-line commands.

4.1. For standard users

auditapp feature-extraction --config path/to/your/feature_extraction/config/file.yaml
auditapp metric-extraction --config path/to/your/metric_extraction/config/file.yaml

4.2. For developers

If you are using the developer mode of AUDIT (installed via the repository), you can directly run the feature extraction and metric extraction modules as follows:

python src/audit/feature_extraction.py --config path/to/your/feature_extraction/config/file.yml
python src/audit/metric_extraction.py --config path/to/your/metric_extraction/config/file.yml

In developer mode, specifying the --config parameter is optional. Instead, you can edit the default configuration files provided by the library to suit your needs. These files are located at AUDIT/src/audit/configs folder. Simply modify the configuration files (feature_extraction.yml and metric_extraction.yml) to match your requirements before running the commands. Additionally, AUDIT provides config files that allows users to run de APP by default. Some of the functionalies are then limited.

Logs and output files will be saved in the directories specified in the configuration files (default are the logs/ outputs/ folders).

5. Run AUDIT App

The AUDIT web app provides an interactive interface for exploring your data and visualizing metrics. Start the app with:

5.1. For standard users

auditapp run-app --config path/to/your/app/config/file.yml

5.2. For developers

python src/audit/app/launcher.py --config path/to/your/app/config/file.yml

In developer mode, specifying the --config parameter is optional. Instead, you can edit the default configuration file provided by the library to suit your needs. These file is located at AUDIT/src/audit/configs folder. Simply modify the configuration file (app.yml) to match your requirements before running the commands.

This will open the app in your default web browser at http://localhost:8501/. Use the dashboards to:

Explore univariate and multivariate data distributions.
Compare model performance across datasets.
Analyze trends in longitudinal data.

6. Additional Tips

For ITK-Snap integration, ensure it is installed and configured correctly. Use the logs folder to monitor execution details for debugging.

You're all set to start using AUDIT! Dive into your MRI data, evaluate AI models, and gain deeper insights with the help of AUDIT’s powerful tools. For further details, check out the other sections of the documentation.