Readthedocs.org: Illegal instruction error

Created on 6 Mar 2018  路  20Comments  路  Source: readthedocs/readthedocs.org

Details

Expected Result

I regularly update the repo and the docs should rebuild.

Actual Result

In recent commits the building failed with the following error log when starting Sphinx:

python /home/docs/checkouts/readthedocs.org/user_builds/zhusuan/envs/latest/bin/sphinx-build -T -E -b readthedocs -d _build/doctrees-readthedocs -D language=en . _build/html
Running Sphinx v1.7.1
Illegal instruction

My local building works well and I've tried Sphinx 1.6.5 and 1.7.1. They both failed with the same error. Do you have an idea what the problem is?

replication Support

Most helpful comment

@stsewd @berkerpeksag I just updated the dev branch and found the memory error when downloading tensorflow 1.6.0.

All 20 comments

I imported your project in my local instance and I wasn't able to reproduce this issue. We will need more research on this.

Thanks for taking a look. Anything I can help? I've been bothered for a while about this.

I tried to import your project on my local instance, but I got an memory limit error, so I think probably that is the issue. Related to #3613.

@humitos Do you have the default limits for memory and time on your local installation?

@stsewd I modified them using the local_settings.py with these values

DOCKER_LIMITS = {
    'memory': '2048m',
    'time': 3600,
}

@stsewd Can you paste the log here? It's strange that my project takes that much memory.
I set

ulimit -Sv 1000000 #1000m

to run make html. It doesn't seem to fail.

Looking at http://readthedocs.org/projects/zhusuan/builds/6828690/, the return code is 132 which means sphinx-build was interrupted by SIGILL. I don't rule out the OOM case since Python 2 is doing a poor job at handling OOM errors, but this might be caused by an extension module compiled with a flag that is not supported by CPU (or virtual CPU?) in our servers.

We probably need to check versions of all dependencies with extension modules (numpy, tensorflow, matplotlib etc.) in http://readthedocs.org/projects/zhusuan/builds/6828690/ (first failed build) and http://readthedocs.org/projects/zhusuan/builds/6822461/ (last completed build)

@stsewd @berkerpeksag I just updated the dev branch and found the memory error when downloading tensorflow 1.6.0.

I did some experiments on the dev branch. When I set TF to 1.4.0 in the doc requirements file, everything works well (build page). When I changed it to 1.6.0, the build failed with a MemoryError (build page) or Illegal instruction (build page).

I'm now using <=1.4.0 for a temporal fix. Should I request more memory for my project or wait for a fix?

@thjashin I'm glad that you have your docs working! And not really sure if with more memory your problem would go away, what @berkerpeksag mentions is also very valid (but the builds are executed within a docker container, so maybe a problem with the host?).

Looking at the build logs shared by @thjashin in https://github.com/rtfd/readthedocs.org/issues/3738#issuecomment-371385013, I think there are two different problems:

  1. Getting MemoryError when installing tensorflow 1.6.0 (it looks like it randomly fails with MemoryError)
  2. SIGILL after tensorflow 1.6.0 successfully installed (without getting MemoryError)

There are some reports about the second problem in tensorflow's issue tracker: https://github.com/tensorflow/tensorflow/issues/17373 (uses precompiled wheels like us), https://github.com/tensorflow/tensorflow/issues/17411 (same issue) and https://github.com/tensorflow/tensorflow/issues/17441 So I'm beginning to think that the cause of the problem is a buggy wheel distribution.

I don't know what to do with the first problem though. Perhaps we could just increase memory limit (this needs to be discussed with operations team) or implement a retry mechanism (but it's hard to guess whether a MemoryError raised randomly)

@thjashin simple question: do you _really_ need TensorFlow to build your documentation? If it's not a strict requirement for building the docs, the better solution here is to not install it on RTD env. You can avoid installing it by using a docs/requirements.txt specific for RTD with only the packages needed to build your docs :)

@stsewd @berkerpeksag Thanks for the comments and pointers. I guess the SIGILL problem would be solved in future versions of TF. As for the MemoryError, how about the solution proposed by @stsewd ?

@humitos Maybe I'm doing in the wrong way, but since there are plenty of import tensorflow in my source code, is there any way to let sphinx.autodoc generate api docs from doc strings without installing TF?

@thjashin you could try mock TF https://docs.readthedocs.io/en/latest/faq.html#i-get-import-errors-on-libraries-that-depend-on-c-modules

@stsewd Cool, thanks. I shall try this.

is there any way to let sphinx.autodoc generate api docs from doc strings without installing TF?

If you are using autodoc, you need to install it.

Otherwise, supposing that you don't need autodoc, you can mock it as http://docs.readthedocs.io/en/latest/faq.html#i-get-import-errors-on-libraries-that-depend-on-c-modules

@thjashin hey, I see your latest builds are passing, were you able to solve the issue?

@stsewd Thanks for asking. It passed because I'm using TF 1.4. I haven't got time to try the mock solution since I'm busy with other things these days. I will report in this thread once I try it.

Downgrading worked for me too. Thanks for reporting, and sharing work-around!

Thanks for the information everyone! I see the project is building now, so perhaps the immediate error was resolved with these solutions. I would echo using mocking to anyone hitting a similar issue. Closing this for now, but speak up if this error is still a problem for you.

Was this page helpful?
0 / 5 - 0 ratings