Spinalcordtoolbox: launcher VERY slow when run from remote disk

Created on 26 Apr 2016  路  37Comments  路  Source: spinalcordtoolbox/spinalcordtoolbox

Implementing a solution in branch poq_issue_838_slow_exe

see video here:
https://drive.google.com/open?id=0Bx3A13n3Q_EAdVJMR0laRUZtZkE

HIGH

Most helpful comment

Other stuff from the slow mount:
Starting python is slow:

$ time python -c ''

real    0m0.864s
user    0m0.016s
sys 0m0.004s

Then

$ python -m cProfile -s cumulative  ~/neuropoly/spinalcordtoolbox/scripts/sct_propseg.py -h

        228704 function calls (227107 primitive calls) in 4.756 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.027    0.027    4.757    4.757 sct_propseg.py:14(<module>)
        1    0.001    0.001    3.157    3.157 msct_parser.py:80(<module>)
       10    0.218    0.022    2.066    0.207 __init__.py:1(<module>)
        1    0.221    0.221    1.933    1.933 msct_types.py:17(<module>)
        2    0.057    0.028    1.301    0.651 __init__.py:10(<module>)
        1    0.057    0.057    1.298    1.298 optic.py:1(<module>)
        1    0.330    0.330    1.223    1.223 sct_utils.py:16(<module>)
        1    0.053    0.053    1.181    1.181 __init__.py:106(<module>)
        1    0.000    0.000    1.004    1.004 add_newdocs.py:10(<module>)
        1    0.000    0.000    0.943    0.943 type_check.py:3(<module>)
        1    0.055    0.055    0.662    0.662 __init__.py:7(<module>)
        2    0.136    0.068    0.532    0.266 __init__.py:88(<module>)

Its seems that conclusion are the same as Georges... I will do two things,

  • Move to /tmp ASAP (how fast can we do that is the question) in the code and check if it is faster.
  • Get a better profile

All 37 comments

How is the disk mounted, and have you tried to run other executable from the mounted disk?

Then, it is good practice to install executable on local disk. However I am surprised by how slow it is.

the disk was mounted with default Apple smb3 protocol, via VPN. I did not try other executable.
I am at Poly now, and the disk is mounted within the same VLAN, using the same protocol (smb3). It is MUCH faster, however, slightly slower than the local call.

I don't know samba and smb3 very well but I am not surprised. If the disk was mounted over a home internet connection network, then to the risq, then the neuropoly vpn... The disk latency must be pretty important.

From the Error you get at ctrl-C, it is the python imports that are slow. There is a lot of file io in that phase, and since all the python libraries are on the mounted system... I am not surprise that it is that slow and the only obvious way to get around that problem is to make sure that the SCT python/ directory is on a local disk, or at least on one with low latency.

To check the latency on my linux machine I just do iostat -x and look a the "await" output ,but I am not sure how to do that on a mac (http://apple.stackexchange.com/questions/52751/mac-os-x-disk-performance-monitoring).

ok so if there is no easy fix i guess we can remove the priority:high, and ask for users to install SCT on a local disk

might be causing issue #967

i increased the priority of this issue because with the new server we are now running processes on it, and SCT functions are slower to run. Also, this issue affects SCT users in other research labs who are using network home folders.

Here's a crude profiling of sct_propseg using time and remotely logged in to duke at the path /Volumes/sct_testing/small

time ls

real    0m0.038s
user    0m0.002s
sys     0m0.004s

time sct_propseg # no changes

real    0m15.497s
user    0m1.499s
sys     0m0.430s

time sct_propseg #  without the sct version printout

real    0m13.985s
user    0m1.499s
sys     0m0.400s

time sct_propseg # import all modules and functions and that's it.

real    0m14.370s
user    0m1.512s
sys     0m0.399s

time sct_propseg # no imports, only load the functions

real    0m3.432s
user    0m0.883s
sys     0m0.211s

time sct_propseg # import msct_parser only and functions

real    0m8.843s
user    0m1.111s
sys     0m0.307s

time sct_propseg # from spinalcordtoolbox.centerline import optic only and functions

real    0m13.443s
user    0m1.485s
sys     0m0.379s

time sct_propseg # import spinalcordtoolbox.centerline only and functions

real    0m3.248s
user    0m0.898s
sys     0m0.215s

The preliminary tests shows the latency is caused by:

  • the method of importing modules in the scripts. import foo versus from foo import bar
  • using git rev-parse is slow in general.
  • import msct_parser is slow to load.

Also to note that these number depend on the connection being open. In most OS, if the connection is not used for a long period of time, it is closed until the next request. example: the first call to ls -al is slower then the second, if the second call is done soon after.

bash-3.2$ time ls -al
...
real    0m6.543s
user    0m0.002s
sys     0m0.005s

bash-3.2$ time ls -al
total 656
...
real    0m0.812s
user    0m0.002s
sys     0m0.004s

Immediate suggestions

Use your home directory as the working directory, and used the full remote path to the files.

Short term suggestion

Profile the scripts modules and identify IO bottlenecks.

Long term suggestion

Build file system module to manage all of SCT's IO operations.

Thank you for these investigations @peristeri. Regarding the import, we had a discussion over a year ago with @poquirion and he suggested we go for "import xx as xx" instead of "from xx import xx". @poquirion any comments?

Regarding the short term solution: how do you suggest we implement it? Should we check in the bash caller if the user is on a mounted disk and spit a warning? Or should we automatically cd to the home dir and at the end of the process go back?

Profiling should be done first on the most common modules, such as msct_image and msct_parser. And follow up with profiling the most common commands (sct_label_utils, sct_propseg, ...)

Unfortunately, pycharm's profiler is only available in the professional version.

line_profile, https://github.com/rkern/line_profiler identifies the time taken for each line and frequency called. I will begin now profiling msct_parser as a proof of concept, since it is a heavy import.

I would add the remote drive instructions in the get started documentation.

I tried checking the pwd in the sct_launcher, but I was not able to find a reliable way identify if it is a remote drive. I will look into maybe using python's is_remote function but the solution is not a clean one.

thanks George. @poquirion @perone any ideas?

I've been looking into that a bit this morning and it seems that the problem is not specific to the sct.

I have mount a slow smb share on my laptop and when it is my working directory, vim ~/.bashrc takes forever to start.

Its probably slow every time a system call is done.

@peristeri concerning the profiling, is the sct installed on the remote disk or is it just your working directory.

if the sct is install on the remote, the slowliness is expected...

The problem is just that calling the sct from a remote (#PWD is /remote) is slow, even when no files are created on that remote.
Like I just posted, even opening a file in my home with vim is slow when I am on a slow remote.

As for the comment of what we should do, moving right in a /tmp directory when the sct is started might kill two bird with one stone.

  • The system call will be faster
  • It will create .tmp files in a mount that is not a smb share

We will need to do that at the begining of the tool, just after the logging instantiation, in this way we will minimise the amount of '/tmp' folder created to just one per system/bash call.

sct_propseg -h 

PWD=$HOME ==> fast

real    0m0.935s
user    0m0.828s
sys 0m0.056s

PWD=$SMB_SHARE == > slow

real    0m7.399s
user    0m0.900s
sys 0m0.120s

===========================================


PWD=$HOME ==> fast

real    0m0.764s
user    0m0.672s
sys 0m0.056s

PWD=$SMB_SHARE == > slow

real    0m3.184s
user    0m0.616s
sys 0m0.072s

The laucher account for about 40% on the slow share but for 80% on the fast one...

There is the "SCT_LOADED_ONCE=1" options that accelerate the sct_launcher call (the slow part of the launcher is only ran once, an all the sct call using the laucher are much faster.) Sourcing sct_laucher has the effect of loading all the slow stuff and setting SCT_LOADED_ONCE.

From the fast mount:

$  source  ~/neuropoly/spinalcordtoolbox/bin/sct_launcher
$ time ~/neuropoly/spinalcordtoolbox/bin/sct_launcher
real    0m0.005s
user    0m0.000s
sys 0m0.004s
$ time ~/neuropoly/spinalcordtoolbox/bin/sct_propseg -h
real    0m0.249s
user    0m0.224s
sys 0m0.016s

=================
From the slow mount

$  source  ~/neuropoly/spinalcordtoolbox/bin/sct_launcher
$ time ~/neuropoly/spinalcordtoolbox/bin/sct_launcher
real    0m0.087s
user    0m0.000s
sys 0m0.000s
$ time ~/neuropoly/spinalcordtoolbox/bin/sct_propseg -h
real    0m4.585s
user    0m0.256s
sys 0m0.064s

the sct_propseg -h call is still pretty slow, and its not coming from the launcher.

Other stuff from the slow mount:
Starting python is slow:

$ time python -c ''

real    0m0.864s
user    0m0.016s
sys 0m0.004s

Then

$ python -m cProfile -s cumulative  ~/neuropoly/spinalcordtoolbox/scripts/sct_propseg.py -h

        228704 function calls (227107 primitive calls) in 4.756 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.027    0.027    4.757    4.757 sct_propseg.py:14(<module>)
        1    0.001    0.001    3.157    3.157 msct_parser.py:80(<module>)
       10    0.218    0.022    2.066    0.207 __init__.py:1(<module>)
        1    0.221    0.221    1.933    1.933 msct_types.py:17(<module>)
        2    0.057    0.028    1.301    0.651 __init__.py:10(<module>)
        1    0.057    0.057    1.298    1.298 optic.py:1(<module>)
        1    0.330    0.330    1.223    1.223 sct_utils.py:16(<module>)
        1    0.053    0.053    1.181    1.181 __init__.py:106(<module>)
        1    0.000    0.000    1.004    1.004 add_newdocs.py:10(<module>)
        1    0.000    0.000    0.943    0.943 type_check.py:3(<module>)
        1    0.055    0.055    0.662    0.662 __init__.py:7(<module>)
        2    0.136    0.068    0.532    0.266 __init__.py:88(<module>)

Its seems that conclusion are the same as Georges... I will do two things,

  • Move to /tmp ASAP (how fast can we do that is the question) in the code and check if it is faster.
  • Get a better profile

Added this in sct_laucher under the "loaded once" bloc:

  INIT_DIR=$PWD
  SCT_WORK_DIR=$(mktemp -d -t sct_tmp.$(date +%d%m%y).XXXXXX)
  cd $SCT_WORK_DIR

then, in the slow directory

$ time ~/neuropoly/spinalcordtoolbox/bin/sct_propseg -h
real    0m1.182s
user    0m0.808s
sys 0m0.072s

C'est 莽a la magie!
http://www.youtube.com/watch?v=8JYktr14NBk&t=3m5s

However, I see a couple problems...

  • we will need to carry an env var (here $INIT_DIR) to set all the relative path in the sct and pray that it does not break. This would need to be set of -i and -o options (and -ofolder I think). Is there any other options?
  • make sure that sourcing sct_laucher does not take you in the tmp dir, since sourcing sct_laucher is a developer's move.

... In the end I am a bit uncomfortable doing the cd to /tmp in the launcher especially for what is means for resolution of relative paths. Is there unseens side effects.

I can create a method in sct_utils that will resolve all relative paths (and also ~ if they are present) using SCT_INIT_DIR if it is present. I can also put the condition to use SCT_INIT_DIR if and only of PWD is SCT_TMP_DIR.

And will need to change all the os.path.abspath call to the method.

After the discussion we just had about python looking in the current dir:
(And this is an erratum from the a deleted post!)

I still have not find an option to prevent python to prepend $PWD in sys.path. However, I ran a test doing

import sys
sys.path.pop(INDICE OF $PWD)

as propseg first step.

With $PWD in sys.path

time ~/neuropoly/spinalcordtoolbox/scripts/sct_propseg.py -h > /dev/null 

real    0m3.840s
user    0m0.292s
sys 0m0.056s

Without $PWD in sys.path

time ~/neuropoly/spinalcordtoolbox/scripts/sct_propseg.py -h > /dev/null 
real    0m1.448s
user    0m0.212s
sys 0m0.048s

The difference is not neglectable. However, the possition $PWD is not always the same, depending on how the sct is set...
form the doc:
https://docs.python.org/3/library/sys.html#sys.path

A not too disruptive way of removing $PWD is to do it at the top of sct_util

import os
import sys

cwd = os.getcwd()
if cwd in sys.path:
    sys.path.remove(cwd)
print ("\n".join(sys.path))

very interesting, how does the last timing you obtain compares to the local call?

Ok, I think I got a reasonable solution. I am just moving $PWD to the end of the search path. The speed is still pretty good when doing that.

Here, $PWD is the slow path

All paths in

real    0m4.120s
user    0m0.280s
sys 0m0.040s

removing slow path

real    0m1.438s
user    0m0.228s
sys 0m0.036s

putting slow path at the end of sys.path

real    0m1.947s
user    0m0.228s
sys 0m0.048s

But I also got real up to 3s from time to time.

Cd in /tmp on the other hand gives:

real    0m1.376s
user    0m1.020s
sys 0m0.060s

Put $PWD at the end of the search path, and add a note about that in the sct is probably the less disruptive way of doing this... We should also log a warning when doing so.

Removing the slow path

$ python -m cProfile -s cumulative  ~/neuropoly/spinalcordtoolbox/scripts/sct_propseg.py -h
         228713 function calls (227116 primitive calls) in 3.321 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.777    0.777    3.322    3.322 sct_propseg.py:13(<module>)
        1    0.007    0.007    1.639    1.639 msct_parser.py:80(<module>)
       10    0.120    0.012    1.044    0.104 __init__.py:1(<module>)
        1    0.104    0.104    0.956    0.956 msct_types.py:17(<module>)
        2    0.027    0.013    0.751    0.376 __init__.py:10(<module>)
        1    0.025    0.025    0.751    0.751 optic.py:1(<module>)
        1    0.164    0.164    0.675    0.675 sct_utils.py:15(<module>)
        1    0.028    0.028    0.598    0.598 __init__.py:106(<module>)
        1    0.000    0.000    0.509    0.509 add_newdocs.py:10(<module>)
        1    0.000    0.000    0.477    0.477 type_check.py:3(<module>)
        1    0.001    0.001    0.360    0.360 analyze.py:83(<module>)
        1    0.026    0.026    0.323    0.323 __init__.py:7(<module>)
[...]

Putting it at the end

$ python -m cProfile -s cumulative  ~/neuropoly/spinalcordtoolbox/scripts/sct_propseg.py -h
         228714 function calls (227117 primitive calls) in 2.922 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.031    0.031    2.923    2.923 sct_propseg.py:13(<module>)
        1    0.007    0.007    1.676    1.676 msct_parser.py:80(<module>)
       10    0.148    0.015    1.166    0.117 __init__.py:1(<module>)
        1    0.106    0.106    1.040    1.040 msct_types.py:17(<module>)
        2    0.059    0.030    1.026    0.513 __init__.py:10(<module>)
        1    0.028    0.028    0.995    0.995 optic.py:1(<module>)
        1    0.026    0.026    0.675    0.675 __init__.py:106(<module>)
        1    0.165    0.165    0.630    0.630 sct_utils.py:15(<module>)
        1    0.000    0.000    0.557    0.557 add_newdocs.py:10(<module>)
        1    0.000    0.000    0.523    0.523 type_check.py:3(<module>)
        1    0.001    0.001    0.411    0.411 analyze.py:83(<module>)
        1    0.026    0.026    0.340    0.340 __init__.py:7(<module>)

[...]

cool! can you also give comparative times with the call from /tmp ?

First, the "unwanted" behaviour is the same on all Python programs that would be stored on a slow/high-latency filesystem, this is the way Python works: sys.path contains the path of the executed scripts, and imports will be tried from there first (https://docs.python.org/3/library/sys.html#sys.path).

IMHO the cleanest solution is code-less, and is to (perhaps educate to) install SCT on one's local storage; what the PR does is at least contained within the scripts, so it's not too bad, but it's still ad-hoc, and partial, since SCT internal imports will still have to be performed on the slow network.

@zougloub: in this specific issue, SCT is installed on one's local storage. Then the user goes into a remote disk, runs SCT and experiences this latency.

Ah.

Well, here's some analysis. After checking that invoking Python directly didn't cause issues, the following experiment was done and the extra accesses to try and import in the working directory are caused by the launcher, which causes addition of the current working directory to sys.path:

# create a recognizable working directory
mkdir ~/my_sct_workdir && cd ~/my_sct_workdir
# create a useless script that does nothing
echo 'import sys; print("my path: %s" % sys.path)' > $SCT_DIR/scripts/sct_fake.py
# create a wrapper for it
ln -sf sct_propseg $SCT_DIR/bin/sct_fake
# trace direct script invocation
( strace -e trace=open,stat,readdir,fork,vfork,execve,clone python -u -v $SCT_DIR/scripts/sct_fake.py 2>&1 ) > 0_fake_just_python.txt
# trace script invocation with bundled interpreter
( strace -e trace=open,stat,readdir,fork,vfork,execve,clone $SCT_DIR/dev/python/bin/python -u -v $SCT_DIR/scripts/sct_fake.py 2>&1 ) > 1_fake_conda_python.txt
# trace script invocation with launcher, altered to do python -u -v
( strace -e trace=open,stat,readdir,fork,vfork,execve,clone -f sct_fake 2>&1 ) > 2_fake_with_launcher

# check when the workdir is accessed in the 3 tests
for x in 0 1 2; do grep my_sct_workdir  ${x}_fake_* | awk '{ FS=":"; print $1 }' | wc -l; done
0
0
80

# check sys.path
grep "my path" ?_fake_*
0_fake_just_python.txt:my path: ['/home/cJ/sct_dev/scripts', '/usr/lib64/python27.zip', '/usr/lib64/python2.7', '/usr/lib64/python2.7/plat-linux2', '/usr/lib64/python2.7/lib-tk', '/usr/lib64/python2.7/lib-old', '/usr/lib64/python2.7/lib-dynload', '/home/cJ/.local/lib64/python2.7/site-packages', '/usr/lib64/python2.7/site-packages', '/usr/lib64/python2.7/site-packages/geos', '/usr/lib64/python2.7/site-packages/gtk-2.0', '/usr/lib64/python2.7/site-packages/wx-3.0-gtk2']
1_fake_conda_python.txt:my path: ['/home/cJ/sct_dev/scripts', '/home/cJ/sct_dev/python/lib/python27.zip', '/home/cJ/sct_dev/python/lib/python2.7', '/home/cJ/sct_dev/python/lib/python2.7/plat-linux2', '/home/cJ/sct_dev/python/lib/python2.7/lib-tk', '/home/cJ/sct_dev/python/lib/python2.7/lib-old', '/home/cJ/sct_dev/python/lib/python2.7/lib-dynload', '/home/cJ/sct_dev/python/lib/python2.7/site-packages', '/home/cJ/sct_dev/python/lib/python2.7/site-packages/setuptools-23.0.0-py2.7.egg']
2_fake_with_launcher:my path: ['/home/cJ/sct_dev/scripts', '/home/cJ/sct_dev/scripts', '/home/cJ/my_sct_workdir', '/home/cJ/sct_dev/python/lib/python27.zip', '/home/cJ/sct_dev/python/lib/python2.7', '/home/cJ/sct_dev/python/lib/python2.7/plat-linux2', '/home/cJ/sct_dev/python/lib/python2.7/lib-tk', '/home/cJ/sct_dev/python/lib/python2.7/lib-old', '/home/cJ/sct_dev/python/lib/python2.7/lib-dynload', '/home/cJ/sct_dev/python/lib/python2.7/site-packages', '/home/cJ/sct_dev/python/lib/python2.7/site-packages/setuptools-23.0.0-py2.7.egg']

Will track down where this comes from...

Got it. This is caused by the launcher's add_to_path function which is appending a colon without checking, going from (nothing) to ${SCT_DIR}: , which is actually 2 entries: ${SCT_DIR} and ..

Since the scripts are actually located in $SCT_DIR/scripts, adding this folder to PYTHONPATH is useless. Thus the creation of PYTHONPATH could be forgone altogether.

I fixed the bug in add_to_path, and got rid of adding script to the PYTHONPATH

tried running sct_propseg on django using cf64e93d55cadd6ad3710dfbe7d847784ed57cbc, from a directory within duke, and there was no abnormal delay. So it seems the issue has now been fixed.
We need more confirmations from other lab members.

Was this page helpful?
0 / 5 - 0 ratings