Serving: Aspired servables/versions aren't refreshed following config updates

Created on 18 Dec 2019  路  4Comments  路  Source: tensorflow/serving

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04, RHEL 8
  • TensorFlow Serving installed from (source or binary): Both
  • TensorFlow Serving version: 1.15.0

Describe the problem

Currently when the configured model list is updated via a call to handleReloadConfigRequest, the request thread blocks until any newly added models become available.

Their availability however depends on the filesystem polling thread rescanning the filesystem at some periodic interval, meaning that there's an arbitrary delay before the requested changes actually take effect and the RPC returns.

This problem may not be very noticeable with the default polling interval of 1 second, but seems undesirable for longer intervals and in particular makes API-based dynamic reconfiguration incompatible with the --file_system_poll_wait_seconds=0 setting (in this case all handleReloadConfigRequest calls time-out and do not take effect).

Exact Steps to Reproduce

  1. Start tensorflow_model_server with the --file_system_poll_wait_seconds=0 option and empty initial config (no models)
  2. Call handleReloadConfigRequest API with a ModelListConfig containing a (valid) new model. It will hang indefinitely or until the grpc deadline

I have opened PR #1518 with proposed fix.

awaiting response bug

Most helpful comment

Thanks for the contribution, @njhill! I will review and post my comments.

All 4 comments

Any interest in this fix? Might it make the next release?

Thanks for the contribution, @njhill! I will review and post my comments.

Please update this once the unit tests are in. Thanks again for the contribution!

@njhill Can you please respond to the above comment so that we can take the discussion forward. Thanks!

Was this page helpful?
0 / 5 - 0 ratings