By default, the filesystem is polled regularly to detect and load model changes. According to the code, setting --file_system_poll_wait_seconds=-1 should completely deactivate polling.
What happens instead is that the server never starts serving the model, instead it just hangs on startup.
tensorflow_model_server --file_system_poll_wait_seconds=-1Server starts up and serves the model but doesn't react to model changes on the filesystem.
Just to make sure I'm reproducing this as you are..
How are you setting file_system_poll_wait_seconds flag when you run the docker image?
I change the entrypoint and the command arguments.
Example:
docker run --rm --entrypoint=/usr/bin/tensorflow_model_server -e AWS_ACCESS_KEY_ID=x -e AWS_SECRET_ACCESS_KEY=x tensorflow/serving:1.12.0 --model_config_file=s3://x/model.yaml --file_system_poll_wait_seconds=-1
Oh I see - this is the tracking bug filed from #1291.
Got it thanks - @hgadig can you close the other one now that this one is filed?
Hi Joe,
According to the comment here: https://github.com/tensorflow/serving/blob/63d31a33b4f6faeb0764bb159d403f2b49061aed/tensorflow_serving/sources/storage_path/file_system_storage_path_source.proto#L68 is supposed to be for testing only. I think what happened here is that when it is disabled, it never tries to load the model during startup so the server will hang.
If loading model only once is needed from your side, could you please rename this as a FR so that we can track this? thanks.
Can I also suggest that the default for file_system_poll_wait_seconds is reverted back to the original 30 seconds instead of the current 1 second default? The default seems to have changed when the command-line argument was introduced: https://github.com/tensorflow/serving/pull/214#discussion_r84943476
This seemingly innocent change has cost @joekohlsdorf's and our company upwards of thousands of dollars. The sensible default of 30 seconds would've brought that number down to $33.
I also don't think that there is any material advantage to picking up a new model within a single second of deploying it versus 30 seconds. I would imagine that models themselves have a meaningful release workflow that is measured in days if not weeks. That ~15 second delay in serving the new models will not affect anyone's bottom line but a $1,000+ S3 bill most certainly will!
Thank you for the great work on tensorflow-serving. I'm happy to submit a PR for the proposed revert to the original behavior.
@rgabo happy to review a PR to change default poll interval from 1 sec to 30 secs.
I change the entrypoint and the command arguments.
Example:
docker run --rm --entrypoint=/usr/bin/tensorflow_model_server -e AWS_ACCESS_KEY_ID=x -e AWS_SECRET_ACCESS_KEY=x tensorflow/serving:1.12.0 --model_config_file=s3://x/model.yaml --file_system_poll_wait_seconds=-1
(out of curiosity) does providing a yaml formatted config file work (ignoring the disabling aspect)?