Serving: File system polling cannot be deactivated

Created on 30 Mar 2019  路  7Comments  路  Source: tensorflow/serving

Bug Report

System information

  • TensorFlow Serving installed from (source or binary): official Docker image tensorflow/serving
  • TensorFlow Serving version: 1.12.0

Describe the problem

By default, the filesystem is polled regularly to detect and load model changes. According to the code, setting --file_system_poll_wait_seconds=-1 should completely deactivate polling.
What happens instead is that the server never starts serving the model, instead it just hangs on startup.

Exact Steps to Reproduce

  • tensorflow_model_server --file_system_poll_wait_seconds=-1

Expected behavior

Server starts up and serves the model but doesn't react to model changes on the filesystem.

awaiting response bug

All 7 comments

Just to make sure I'm reproducing this as you are..
How are you setting file_system_poll_wait_seconds flag when you run the docker image?

I change the entrypoint and the command arguments.

Example:
docker run --rm --entrypoint=/usr/bin/tensorflow_model_server -e AWS_ACCESS_KEY_ID=x -e AWS_SECRET_ACCESS_KEY=x tensorflow/serving:1.12.0 --model_config_file=s3://x/model.yaml --file_system_poll_wait_seconds=-1

Oh I see - this is the tracking bug filed from #1291.
Got it thanks - @hgadig can you close the other one now that this one is filed?

Hi Joe,

According to the comment here: https://github.com/tensorflow/serving/blob/63d31a33b4f6faeb0764bb159d403f2b49061aed/tensorflow_serving/sources/storage_path/file_system_storage_path_source.proto#L68 is supposed to be for testing only. I think what happened here is that when it is disabled, it never tries to load the model during startup so the server will hang.

If loading model only once is needed from your side, could you please rename this as a FR so that we can track this? thanks.

Can I also suggest that the default for file_system_poll_wait_seconds is reverted back to the original 30 seconds instead of the current 1 second default? The default seems to have changed when the command-line argument was introduced: https://github.com/tensorflow/serving/pull/214#discussion_r84943476

This seemingly innocent change has cost @joekohlsdorf's and our company upwards of thousands of dollars. The sensible default of 30 seconds would've brought that number down to $33.

I also don't think that there is any material advantage to picking up a new model within a single second of deploying it versus 30 seconds. I would imagine that models themselves have a meaningful release workflow that is measured in days if not weeks. That ~15 second delay in serving the new models will not affect anyone's bottom line but a $1,000+ S3 bill most certainly will!

Thank you for the great work on tensorflow-serving. I'm happy to submit a PR for the proposed revert to the original behavior.

@rgabo happy to review a PR to change default poll interval from 1 sec to 30 secs.

I change the entrypoint and the command arguments.

Example:
docker run --rm --entrypoint=/usr/bin/tensorflow_model_server -e AWS_ACCESS_KEY_ID=x -e AWS_SECRET_ACCESS_KEY=x tensorflow/serving:1.12.0 --model_config_file=s3://x/model.yaml --file_system_poll_wait_seconds=-1

(out of curiosity) does providing a yaml formatted config file work (ignoring the disabling aspect)?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

brianschardt picture brianschardt  路  3Comments

akkiagrawal94 picture akkiagrawal94  路  3Comments

demiladef picture demiladef  路  4Comments

vikeshkhanna picture vikeshkhanna  路  3Comments

marcoadurno picture marcoadurno  路  3Comments