Currently when the configured model list is updated via a call to handleReloadConfigRequest, the request thread blocks until any newly added models become available.
Their availability however depends on the filesystem polling thread rescanning the filesystem at some periodic interval, meaning that there's an arbitrary delay before the requested changes actually take effect and the RPC returns.
This problem may not be very noticeable with the default polling interval of 1 second, but seems undesirable for longer intervals and in particular makes API-based dynamic reconfiguration incompatible with the --file_system_poll_wait_seconds=0 setting (in this case all handleReloadConfigRequest calls time-out and do not take effect).
tensorflow_model_server with the --file_system_poll_wait_seconds=0 option and empty initial config (no models)handleReloadConfigRequest API with a ModelListConfig containing a (valid) new model. It will hang indefinitely or until the grpc deadlineI have opened PR #1518 with proposed fix.
Any interest in this fix? Might it make the next release?
Thanks for the contribution, @njhill! I will review and post my comments.
Please update this once the unit tests are in. Thanks again for the contribution!
@njhill Can you please respond to the above comment so that we can take the discussion forward. Thanks!
Most helpful comment
Thanks for the contribution, @njhill! I will review and post my comments.