Serving: File system polling cannot be deactivated

Created on 30 Mar 2019 · 7Comments · Source: tensorflow/serving

Bug Report

System information

TensorFlow Serving installed from (source or binary): official Docker image tensorflow/serving
TensorFlow Serving version: 1.12.0

Describe the problem

By default, the filesystem is polled regularly to detect and load model changes. According to the code, setting --file_system_poll_wait_seconds=-1 should completely deactivate polling.
What happens instead is that the server never starts serving the model, instead it just hangs on startup.

Exact Steps to Reproduce

tensorflow_model_server --file_system_poll_wait_seconds=-1

Expected behavior

Server starts up and serves the model but doesn't react to model changes on the filesystem.

awaiting response bug

Source

joekohlsdorf

All 7 comments

Just to make sure I'm reproducing this as you are..
How are you setting file_system_poll_wait_seconds flag when you run the docker image?

misterpeddy on 1 Apr 2019

I change the entrypoint and the command arguments.

Example:
docker run --rm --entrypoint=/usr/bin/tensorflow_model_server -e AWS_ACCESS_KEY_ID=x -e AWS_SECRET_ACCESS_KEY=x tensorflow/serving:1.12.0 --model_config_file=s3://x/model.yaml --file_system_poll_wait_seconds=-1

joekohlsdorf on 1 Apr 2019

Oh I see - this is the tracking bug filed from #1291.
Got it thanks - @hgadig can you close the other one now that this one is filed?

misterpeddy on 3 Apr 2019

Hi Joe,

According to the comment here: https://github.com/tensorflow/serving/blob/63d31a33b4f6faeb0764bb159d403f2b49061aed/tensorflow_serving/sources/storage_path/file_system_storage_path_source.proto#L68 is supposed to be for testing only. I think what happened here is that when it is disabled, it never tries to load the model during startup so the server will hang.

If loading model only once is needed from your side, could you please rename this as a FR so that we can track this? thanks.

nrobeR on 4 Apr 2019

Can I also suggest that the default for file_system_poll_wait_seconds is reverted back to the original 30 seconds instead of the current 1 second default? The default seems to have changed when the command-line argument was introduced: https://github.com/tensorflow/serving/pull/214#discussion_r84943476

This seemingly innocent change has cost @joekohlsdorf's and our company upwards of thousands of dollars. The sensible default of 30 seconds would've brought that number down to $33.

I also don't think that there is any material advantage to picking up a new model within a single second of deploying it versus 30 seconds. I would imagine that models themselves have a meaningful release workflow that is measured in days if not weeks. That ~15 second delay in serving the new models will not affect anyone's bottom line but a $1,000+ S3 bill most certainly will!

Thank you for the great work on tensorflow-serving. I'm happy to submit a PR for the proposed revert to the original behavior.

rgabo on 25 Apr 2019

👍1

@rgabo happy to review a PR to change default poll interval from 1 sec to 30 secs.

netfs on 30 Apr 2019

👍1

I change the entrypoint and the command arguments.

Example:
docker run --rm --entrypoint=/usr/bin/tensorflow_model_server -e AWS_ACCESS_KEY_ID=x -e AWS_SECRET_ACCESS_KEY=x tensorflow/serving:1.12.0 --model_config_file=s3://x/model.yaml --file_system_poll_wait_seconds=-1

(out of curiosity) does providing a yaml formatted config file work (ignoring the disabling aspect)?

netfs on 30 Apr 2019

Was this page helpful?

0 / 5 - 0 ratings