View Issue Details

IDProjectCategoryView StatusLast Update
0002624Slicer4Core: Building (CMake, Superbuild)public2014-03-06 04:51
Reporterjcfr Assigned Tojcfr  
PrioritynormalSeverityminorReproducibilityhave not tried
Status closedResolutionfixed 
Product Version 
Target VersionSlicer 4.2.0Fixed in VersionSlicer 4.2.0 
Summary0002624: X server is crashing on the Linux VMs
Description

Dear Sysadmin,

I successfully upgraded the version of parallel on the factory :) That said, X server is still crashing. I suspect it is because multiple test involving GL display are running in //. Will try to disable parallel testing to see if there are still issue.

Thanks
Jc

TagsNo tags attached.

Relationships

related to 0002367 closedjcfr Fix ubuntu factory error "Xlib: extension "GLX" missing on display ":0.0" 
related to 0002493 closedjcfr Nightly extension builds are not available 
related to 0002638 closedjcfr Submit bug report to Parallel 

Activities

jcfr

jcfr

2012-10-09 06:49

administrator   ~0006457

Here is an update on last factory experiment.

It resumes that the X server crashed with only the linux nightly running. It means that the problem is not caused by interaction between the test running on the Linux VM and tests running on the MacOSX host or windows VM.

Next experiment that will be run today:

  • Re-run nightly tests serially (no parallel mode)
jcfr

jcfr

2012-10-09 10:55

administrator   ~0006476

Hi Bill,

By running the tests sequentially two times in a row and logging the output, it appears to crash in a systematic way. In both case:

Test "qMRMLLayoutManagerTest3" finished with Fatal IO error: client killed

and just the command for "qMRMLLayoutManagerTest4" is logged.

//-----------------------
[...]
142: qMRMLLayoutManagerTest3: Fatal IO error: client killed
[HANDLER_OUTPUT]
142/
622 Test 0000142: qMRMLLayoutManagerTest3 ...................................................***Failed 3.52 sec
[HANDLER_VERBOSE_OUTPUT]
test 143

[HANDLER_OUTPUT]
Start 143: qMRMLLayoutManagerTest4

[HANDLER_VERBOSE_OUTPUT]

143: Test command: /home/kitware/Dashboards/Nightly/Slicer-build-64bits-QT4.7.4-PythonQt-With-Tcl-CLI-Release-nightly/Slicer-build/Slicer "--launcher-no-splash" "--launch" "/home/kitware/Dashboards/Nightly/Slicer-build-64bits-QT4.7.4-PythonQt-With-Tcl-CLI-Release-nightly/Slicer-build/bin/qMRMLWidgetsCxxTests" "qMRMLLayoutManagerTest4"

143: Test timeout computed to be: 900
[...]
//-----------------------

Now, looking at the log of last night where tests ran in parallel, I noticed the following message:

//-----------------------
[...]
159: Test timeout computed to be: 900
141: qMRMLLayoutManagerTest2: Fatal IO error: client killed
123: XIO: fatal IO error 104 (Connection reset by peer) on X server ":0.0"
123: after 7344 requests (7343 known processed) with 99 events remaining.
147: qMRMLModelInfoWidgetTest1: Fatal IO error: client killed
140: qMRMLLayoutManagerTest1: Fatal IO error: client killed
157: qMRMLNodeComboBoxTest6: cannot connect to X server :0
142: qMRMLLayoutManagerTest3: Fatal IO error: client killed
158: qMRMLNodeComboBoxTest7: cannot connect to X server :0
[...]
//-----------------------

Note: Running these two tests directly doesn't seem to cause any issue.

Independently of my tests on the linux VM, Sankhesh was just running some tutorial test involving the GPU and the host itself froze. That said, I think this issue is probably independent of the one I am working on.

Regarding the linux issue, I suspect the "prlvideo" driver installed on linux to allow it to behave adequately with the host OS through VMware is in default. For this reason, I am thinking to contact the VMware support. There is 30 days of support included in the version of VMware we just acquired. What is the usual process to contact support ? Can I go ahead directly or should I go through a more formal process ?

Thanks

2012-10-11 14:49

 

Xorg.0.log.old (99,707 bytes)
jcfr

jcfr

2012-10-11 14:52

administrator   ~0006502

Last edited: 2012-10-11 14:57

The test causing trouble has been identified: qMRMLLayoutManagerTest3

Waiting a better solution is implemented, factory dashboard driver scripts associated have been updated to exclude that test.

I suspect that the prlvideo driver allowing parallel to access the video card of the host has some issues when two tests dealing with multiple GL contexts are run in a row. May be not all resources are free'ed yet ?

The problem can reproduced by running it two times in a row:

$ ctest -R qMRMLLayoutManagerTest3; ctest -R qMRMLLayoutManagerTest3;

After X server crashed, the associated log file has been attached: Xorg.0.log.old

// --------------
(II) PRLCONTROL: GL PBuffer(this=0x2dec450, host_pbuf=00000013, glxId=4600013, drawId=4600010, GLXDrawablePtr=0x2df1d60, DrawablePtr=0x2d24620, tex_fmt=20DA, tex_tgt=20DD) created

[...] // Details available in file Xorg.0.log.old

(II) PRLCONTROL: GL PBuffer(this=0x2d27c80, host_pbuf=00000018, glxId=460006C, drawId=4600039, GLXDrawablePtr=0x2d21110, DrawablePtr=0x2d4a0e0, tex_fmt=20DA, tex_tgt=20DD) created
(II) PRLCONTROL: GL PBuffer(this=0x2d21800, host_pbuf=16 tex_tgt=20DD) destroyed

Backtrace:
0: /usr/bin/X (xorg_backtrace+0x28) [0x45fcc8]
1: /usr/bin/X (0x400000+0x5dfbd) [0x45dfbd]
2: /lib/libpthread.so.0 (0x7f8e0fd6b000+0xf8f0) [0x7f8e0fd7a8f0]
3: /usr/bin/X (FreeResource+0xae) [0x44fa5e]
4: /usr/bin/X (0x400000+0x41adb) [0x441adb]
5: /usr/bin/X (0x400000+0x4418c) [0x44418c]
6: /usr/bin/X (0x400000+0x261aa) [0x4261aa]
7: /lib/libc.so.6 (__libc_start_main+0xfd) [0x7f8e0ea63c4d]
8: /usr/bin/X (0x400000+0x25d59) [0x425d59]
Segmentation fault at address 0xa8

Caught signal 11 (Segmentation fault). Server aborting

Please consult the The X.Org Foundation support
at http://wiki.x.org
for help.
Please also check the log file at "/var/log/Xorg.0.log" for additional information.

(II) Parallels Mouse: Sliding Mouse disabling [0]
(II) Parallels Mouse: Device was switched off
(II) Parallels Mouse: Device was closed
(II) Parallels Mouse: Link to Open Tools Gate was closed
(II) UnloadModule: "prlmouse"
(II) Parallels Mouse: Module was deinitialized
(II) Power Button: Close
(II) UnloadModule: "evdev"
(II) Sleep Button: Close
(II) UnloadModule: "evdev"
(II) AT Translated Set 2 keyboard: Close
(II) UnloadModule: "evdev"
(II) Parallels Mouse: Device was switched off
(II) Parallels Mouse: Device was closed
(II) UnloadModule: "prlmouse"
(II) Parallels Mouse: Module was deinitialized
(II) Macintosh mouse button emulation: Close
(II) UnloadModule: "evdev"
(II) PRLCONTROL: Share states thread stopped
(II) PRLVIDEO(0): VGA state [3] was restored.
(II) PRLVIDEO: Dynamic resolution was disabled
(II) PRLVIDEO(0): VT was left.
ddxSigGiveUp: Closing log
// --------------

Gdb has also been attached to the X process so that a stack trace could be obtained:

// -----------------------------
Continuing.

Program received signal SIGSEGV, Segmentation fault.
FreeResource (id=73400361, skipDeleteFuncType=0) at ../../dix/resource.c:542
542 ../../dix/resource.c: No such file or directory.
in ../../dix/resource.c
#0 FreeResource (id=73400361, skipDeleteFuncType=0)
at ../../dix/resource.c:542
cid = <value optimized out>
res = 0x79413a317a423b31
prev = 0x2318430
head = 0x244af08
0000001 0x0000000000441adb in ProcDestroyWindow (client=0x249b320)
at ../../dix/dispatch.c:745
pWin = 0x24546c0
rc = 35396512
0000002 0x000000000044418c in Dispatch () at ../../dix/dispatch.c:439
result = <value optimized out>
client = 0x249b320
nready = 0
start_tick = 20
0000003 0x00000000004261aa in main (argc=8, argv=0x7ddce8,
envp=<value optimized out>) at ../../dix/main.c:285
i = 1
alwaysCheckForInput = {0, 1}
// ------------------------------------

jcfr

jcfr

2012-10-11 15:06

administrator   ~0006504

Next steps are:

  • engage communication with Parallel to discuss the issue
  • try to get a small test case allowing to reproduce the problem
jcfr

jcfr

2014-03-06 04:50

administrator   ~0010733

Closing resolved issues that have not been updated in more than 3 months

Issue History

Date Modified Username Field Change
2012-10-08 06:56 jcfr New Issue
2012-10-08 06:56 jcfr Status new => assigned
2012-10-08 06:56 jcfr Assigned To => jcfr
2012-10-08 06:56 jcfr Relationship added related to 0002367
2012-10-08 06:57 jcfr Target Version => Slicer 4.2.0 - coming release
2012-10-08 07:53 jcfr Relationship added related to 0002493
2012-10-09 06:49 jcfr Note Added: 0006457
2012-10-09 10:55 jcfr Note Added: 0006476
2012-10-11 14:49 jcfr File Added: Xorg.0.log.old
2012-10-11 14:52 jcfr Note Added: 0006502
2012-10-11 14:53 jcfr Note Edited: 0006502
2012-10-11 14:57 jcfr Note Edited: 0006502
2012-10-11 15:06 jcfr Note Added: 0006504
2012-10-12 13:34 jcfr Relationship added related to 0002638
2012-10-12 13:35 jcfr Status assigned => resolved
2012-10-12 13:35 jcfr Fixed in Version => Slicer 4.2.0 - coming release
2012-10-12 13:35 jcfr Resolution open => fixed
2014-03-06 04:50 jcfr Note Added: 0010733
2014-03-06 04:51 jcfr Status resolved => closed