View Issue Details

IDProjectCategoryView StatusLast Update
0003977Slicer4Core: Base Codepublic2018-03-02 11:07
Reporterlauren Assigned Tolauren  
PrioritynormalSeveritymajorReproducibilityalways
Status closedResolutionfixed 
Product VersionSlicer 4.4.0 
Target VersionFixed in VersionSlicer 4.5.0-1 
Summary0003977: Slicer diffusion tensor computation is not reproducible
Description

The Slicer DTI computation is not reproducible: each time it is run on the same dataset, different values are obtained. In some cases this causes white stripes of higher values in the baseline image, as we have seen with some new data. In other data, the differences are seen by subtracting the images as follows.

Steps to reproduce the bug, which happens in most datasets that we have tried, on both Linux from 9/2013 and MacOS nightly today.

  1. Get DTI tutorial data from here: http://www.slicer.org/slicerWiki/images/0/03/DiffusionMRI_tutorialData.zip. Screenshots also include more highly affected data which I can provide directly if needed.

  2. Load into Slicer.

  3. Calculate tensors in DWI to DTI estimation module. (The bug is present in both LS and WLS estimation modes.)

  4. Calculate tensors a second time from the same dataset, using Create New Volume options to make two new outputs (DTI and Baseline).

  5. Now, compare the two Baselines and the two DTI volumes to show they are not the same. First, go to Subtract Scalar Volumes module and subtract the two Baseline Volumes. On my Mac, there are scattered pixels of nonzero difference with the tutorial data. In other datasets, the differences we have seen range from +/-6 and can cause bright stripes in the Baseline Volume.

  6. Compare the two DTI volumes by measuring FA. Create FA from both DTI volumes in Diffusion Tensor Scalar Measurements, then subtract those two FA volumes as in step 5 above. In the tutorial dataset, the differences are small and more noticeable outside the brain. In other datasets, we saw MUCH larger differences.

Attached screenshots show the results of this test using today's nightly and two datasets, tutorial and another. (Later I hope to add a screenshot of the baseline bright stripe problem that happens in some of our newer data with this bug.)

TagsNo tags attached.

Activities

2015-03-18 09:06

 

2015-03-18 09:13

 

2015-03-18 09:13

 

2015-03-18 09:14

 

lauren

lauren

2015-03-18 09:14

developer   ~0012977

All the attached images are difference images that should be 0 if things were working correctly.

pieper

pieper

2015-03-20 06:00

administrator   ~0012978

Interesting - good thing you spotted this!

We should include Raul and possibly also Gordon in the investigation depending on what we find. I haven't look at the details of the mathematics of the tensor estimation implementation, but I'm pretty sure it is calling the teem code (equivalent of 'tend estim' if I recall correctly) but wrapped in a multithreaded VTK class.

The banding that you see is almost certainly related to the 'slabs' that are allocated to different threads (compute cores) when VTK parallelizes the calculation.

You might see this kind of thing when there's a random number and the different threads get different seeds. But I believe that DWI to DTI estimation is meant to be deterministic.

Instead I would think that it's some kind of thread safety issue in either teem or vtkTeem that causes some cross-talk between threads.

The easiest workaround to test would be turning threading off for the estimator.

Beyond that, for debugging I suggest making a simple test script that demonstrates the issue and can be used to confirm when it is fixed.

lauren

lauren

2015-03-20 06:13

developer   ~0012979

Thanks Steve. I thought it was a multithread bug at first also, but as we re-run it on the same dataset, the bands move apparently randomly, so it is still potentially involving multithread but if so I am surprised that the location of "edges" between thread compute regions are not obviously preserved across runs. This may be hard to tell visually though due to the random element.

Yogesh, Ofer, and I have tested more and found that the magnitude of the variability is very small with "traditional" b=1000 diffusion-weighted data, whereas with newer multishell data, the magnitude is much larger and these bands can be seen.

Another thing I've noticed is that the bands appear in the "baseline" computed from DTI estimation but not in the "baseline" computed from tensor mask estimation, so something different is happening in these two pipelines. Do you know if the mask module is multithreaded?

To summarize, yes I agree there is potentially a multithread component, but it is behaving unusually, and the magnitude of the variability seems to be highly related to the type of data input, which may help track things down somehow.

pieper

pieper

2015-03-20 09:36

administrator   ~0012980

I took a quick look at all the code and I don't see anything obvious in the vtkTeem or teem (ten) code.

There is a lot of floating point math done per-pixel, so it could be something odd with rounding.

lauren

lauren

2015-03-31 13:19

developer   ~0013003

Further testing. In b=1000 data the difference from one calculation to the next is probably insignificant. For example, in some voxels, the FA changed by 0.000001.

So this bug seems to primarily affect multishell data.

This DTI estimation issue is present in the release of Slicer 4.4.0 from 11-02-2014. So it may not be related to the teem upgrade that was incorporated into Slicer in approximately 1/2015.

lauren

lauren

2015-04-01 15:27

developer   ~0013006

I've found that the same bug behavior (tensor estimation is not reproducible) is present in Slicer 3.6.3 and 4.0.0, so it is not caused by vtk 6.0. I've learned from Julie that using tend outside of slicer gives different results from computing tensors inside Slicer (which also uses the teem library), so the vtk interface to teem/tend is the likely culprit at this point.

Since the issues are greatly magnified with multiple b-values including ones higher than 1000, the next place to look is the handling of the b-values inside the vtk classes that interface with teem. There are some comments in this code region about thread safety and computing a maximum b-value in a thread safe region, so I suspect something along these lines, especially due to the striping of the differences in (for example) baseline images computed repeatedly.

pieper

pieper

2015-04-02 03:47

administrator   ~0013007

Did you look at this block of code?

https://github.com/Slicer/Slicer/blob/master/Libs/vtkTeem/vtkTeemEstimateDiffusionTensor.cxx#L509-L525

It looks to me like this code is being called from the threads but it manipulating data that is common across threads. That is, it's called from SetTenContext, which is called from vtkTeemEstimateDiffusionTensorExecute, which is called from ThreadedExecute, but it's manipulating this->DiffusionGradients, which is a shared instance variable.

jcfr

jcfr

2015-04-21 07:02

administrator   ~0013012

What is the status on this ?

Was thinking to update teem version used in Slicer to include the changes that Hans and Raul recently committed to teem SVN. See https://github.com/Slicer/Slicer/pull/262

Let me know if that would interfere with your investigation ?

lauren

lauren

2015-04-21 07:28

developer   ~0013013

This is independent of teem updates, so it's great if you update Slicer to include this fix as well as teem version updates.

jcfr

jcfr

2015-04-21 07:41

administrator   ~0013014

Would be great if you could spend some time to add a test that:
(a) would fail without your patch [1]
(b) pass after your patch is included

To make things easier on your side, you could simply attach a batch file (or python file) implementing this test. This file could invoke a CLI two times in a row and then use cmake -E compare_files /path/to/image1 /path/to/image2 to compare the results.

[1] https://github.com/ljod/Slicer/commit/ca5080bd1c90cf7e941a2ec7703ed3a3800ce015?w=1

jcfr

jcfr

2015-05-11 20:04

administrator   ~0013038

Fix [1] integrated in r24240 [2]

[1] https://github.com/Slicer/Slicer/pull/277

[2] http://viewvc.slicer.org/viewvc.cgi/Slicer4?view=revision&revision=24240

jcfr

jcfr

2015-05-11 20:14

administrator   ~0013039

As a side note, using the input data "helix-DWI.nhdr" [1] and following the instruction. Subtracting the two FAs shows some difference. See screenshots "Slicer-24240_Issue_3977_FA_are_different_for_helix_dwi"

@Lauren: Is that expected ?

[1] https://github.com/Slicer/Slicer/blob/master/Libs/MRML/Core/Testing/TestData/helix-DWI.nhdr

2015-05-11 20:14

 

pieper

pieper

2015-05-12 03:46

administrator   ~0013044

The different FA bothers me - see my comment on pull request 277.

jcfr

jcfr

2015-05-13 00:40

administrator   ~0013045

A self test using @pieper approach has been added. See r24248

As a side note, using the Subtract module and setting the interpolation order to 0 (instead of the default value being 1), the difference of the two FAs is zero.

See https://github.com/Slicer/Slicer/pull/277#issuecomment-101553810

Related Changesets

Import 2017-06-07 23:51:09: master d6a6309d

2015-05-11 23:37:03

jcfr

Details Diff
BUG: Resolve multithread issue that caused errors with multiple b values.

Fixes 0003977

See http://www.na-mic.org/Bug/view.php?id=3977

From: Lauren J. O'Donnell <odonnell@bwh.harvard.edu>

git-svn-id: http://svn.slicer.org/Slicer4/trunk@24240 3bd1e089-480b-0410-8dfb-8563597acbee
mod - Libs/vtkTeem/vtkTeemEstimateDiffusionTensor.cxx Diff File
mod - Libs/vtkTeem/vtkTeemEstimateDiffusionTensor.h Diff File

Import 2017-06-07 23:51:09: master efa75968

2015-05-13 03:36:40

jcfr

Details Diff
ENH: Add self test DTINotReproducibleIssue3977. See issue 0003977

This test has been added to check that successive
run of the "DWI to DTI estimation" CLI module returns the same output.

Problem has been documented in issue 0003977. Commit r24240 fixes the problem
by updating vtkTeemEstimateDiffusionTensor class.

git-svn-id: http://svn.slicer.org/Slicer4/trunk@24248 3bd1e089-480b-0410-8dfb-8563597acbee
mod - Applications/SlicerApp/Testing/Python/CMakeLists.txt Diff File
add - Applications/SlicerApp/Testing/Python/DTINotReproducibleIssue3977.py Diff File

Issue History

Date Modified Username Field Change
2015-03-18 09:06 lauren New Issue
2015-03-18 09:06 lauren Status new => assigned
2015-03-18 09:06 lauren Assigned To => jcfr
2015-03-18 09:06 lauren File Added: FA_difference_other_data.png
2015-03-18 09:13 lauren File Added: baseline_difference_other_data.png
2015-03-18 09:13 lauren File Added: FA_difference_tutorial_data.png
2015-03-18 09:14 lauren File Added: BAseline_difference_tutorialdata.png
2015-03-18 09:14 lauren Note Added: 0012977
2015-03-20 06:00 pieper Note Added: 0012978
2015-03-20 06:13 lauren Note Added: 0012979
2015-03-20 09:36 pieper Note Added: 0012980
2015-03-31 13:19 lauren Note Added: 0013003
2015-04-01 15:27 lauren Note Added: 0013006
2015-04-02 03:47 pieper Note Added: 0013007
2015-04-02 03:48 pieper Assigned To jcfr => lauren
2015-04-02 03:48 pieper Status assigned => confirmed
2015-04-21 07:02 jcfr Note Added: 0013012
2015-04-21 07:28 lauren Note Added: 0013013
2015-04-21 07:41 jcfr Note Added: 0013014
2015-05-11 20:04 jcfr Note Added: 0013038
2015-05-11 20:04 jcfr Status confirmed => resolved
2015-05-11 20:04 jcfr Fixed in Version => Slicer 4.5.0-1
2015-05-11 20:04 jcfr Resolution open => fixed
2015-05-11 20:14 jcfr Note Added: 0013039
2015-05-11 20:14 jcfr File Added: Slicer-24240_Issue_3977_FA_are_different_for_helix_dwi.png
2015-05-12 03:46 pieper Note Added: 0013044
2015-05-13 00:40 jcfr Note Added: 0013045
2017-06-10 08:51 jcfr Changeset attached => Slicer master efa75968
2017-06-10 08:51 jcfr Changeset attached => Slicer master d6a6309d
2018-03-02 11:07 jcfr Status resolved => closed