Evaluation Metrics¶

In Calcium Imaging Denoising, models are required to accurately remove noise from the image sequences, while preserving the integrity of the underlying signal.

In particular, since this kind of data include both spatial and temporal information, it is important for a good denoising model to preserve both the spatial and temporal structure of the signal.

For this reason, we designed the evaluation method of this challenge to take into account these two aspects by defining a temporal (t) and a spatial (s) variant of each metric, that is later merged into a spatio-temporal (st) metric.


Spatio-Temporal SNR¶


Given an image stack (video) of size T x H x W, where T is the number of frames, and H, W are the spatial dimensions of each frame:

The spatial Signal to Noise Ratio (sSNR) is defined as the widely-used Signal-to-Noise (SNR) metric, computed across each frame:

Where y is the ground-truth (i.e., clean) image and x is the output of the denoising algorithm.

The temporal Signal to Noise Ratio (tSNR) is defined as Signal-to-Noise (SNR) metric, computed for each for each temporally resolved signal at each spatial location (i, j):

Lastly, the spatio-temporal Signal to Noise Ratio (stSNR) is computed as a convex combination between the spatial and the temporal SNR:


Final Score¶


Our leaderboards for each task use stSNR as the final evaluation score by averaging it across each file in the leaderboard dataset, formally:

where F is the number of files used in each leaderboard.


Additional Metrics¶


While not included in the final ranking score, we also offer the participants the possibility to inspect other metrics, which are computed using the same spatio-temporal combination as above, but using different metrics as their base formulation instead of using SNR.

For metrics that are based on a data range, we used the difference between 97th and 3rd percentile of the full ground truth stack to remove outliers and keep spatial and temporal metrics comparable.

PSNR¶

PSNR is a widely used metric in image processing for quantifying the similarity between two images, measured in decibels (dB).

Where MAXI is the maximum possible pixel value of the image, for example, when the pixels are represented using 8 bits per sample, this is 255. As stated above, in this case we used the difference between 97th and 3rd percentile of the full ground truth stack. MSE is the Mean Squared Error between the original and reconstructed images.

We used the scikit-image implementation in our evaluation code.

Scale-Invariant PSNR (SI-PSNR)¶

This metric is described in Luo, Yi, and Nima Mesgarani. "Tasnet: time-domain audio separation network for real-time, single-channel speech separation." 2018

SI-PSNR metric is invariant to the scale of the signals being compared, meaning that if one signal is a scaled version of another, the SI-SNR will not change, addressing a limitation of traditional SNR or PSNR metrics sensitive to signal amplitude changes.

SI-PSNR is defined as: 

Where:

Where s and s are the estimated and target clean sources, respectively, s and s are both normalized to have zero-mean to ensure scale-invariance.

Here, we use the scale-invariant implementation from the careamics package, modified to accept a different data-range parameter for normalization.


Metrics Overview¶


Metric Name Description Averaging Domain
sSNR Spatial Signal-to-Noise Ratio Over frames
tSNR Temporal Signal-to-Noise Ratio Over spatial grid
stSNR Spatio-temporal SNR (weighted avg) Global
sPSNR Spatial Peak SNR Over frames
tPSNR Temporal Peak SNR Over spatial grid
stPSNR Spatio-temporal Peak SNR Global
sSI_PSNR Spatial Scale-Invariant PSNR Over frames
tSI_PSNR Temporal Scale-Invariant PSNR Over spatial grid
stSI_PSNR Spatio-temporal Scale-Invariant PSNR Global
<metric>_std Standard deviation (spatial/temporal only), per-file Dispersion

Note: Standard deviations are computed only on a per-file basis. To inspect them, check the result page of your algorithm and inspect the output json file.