Serving: Tensorflow model server crashing with error : Check failed: c->in_use(); tf_serving_entrypoint.sh: line 3: 8 Aborted

Created on 8 Dec 2018  路  11Comments  路  Source: tensorflow/serving

I am running an object detection model using tensorflow/serving:latest-gpu docker image & Nvidia-docker on Amazon Deep Learning AMI (EC2 P3 instance). The model server starts up fine. Then I run a gRPC client that loops through several images & sending them over to the server to fetch predictions. I am getting expected & quick predictions, and the server runs on ~95% GPU utilization (memory used is below limits).

However, often the model server crashes after giving continuous predictions for a while. The error it gives right before crashing is:

F external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:458] Check failed: c->in_use() && (c->bin_num == kInvalidBinNum) 
/usr/bin/tf_serving_entrypoint.sh: line 3: 8 Aborted tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$@"

I have tried sending larger payloads from the client to the server & have observed resource exhaustion errors, which makes sense since the GPU goes out of memory. But I am not able to understand what exactly is causing the above issue.

Can someone please help?

Thanks in advance.

awaiting response

Most helpful comment

I'm having the same issue. Server is getting ~95% utilization and crashes after a few iterations of training. I'm using tf version 1.12.0.
The error I'm getting is F tensorflow/core/common_runtime/bfc_allocator.cc:458] Check failed: c->in_use() && (c->bin_num == kInvalidBinNum) Aborted

All 11 comments

That appears to come from the memory allocator when trying to free memory it thinks has been freed (or is still in use for some reason). This could be a bug in code (say, a memory leak) or simply a side effect of running out of memory for other reasons.

There's not enough information here to debug anything further, though. This is deep in Tensorflow core logic, so if you can reproduce the issue, you might want to file an issue on the Tensorflow project.

@dipsatch your problem is definitely related to memory (whether your own code or tensorflow itself), I had the same issue and see that before the tf server crashed memory usage was at capacity

Closing this issue as there is no response received from the user. Feel free to post updates(if any), we will reopen the issue.

I'm having the same issue. Server is getting ~95% utilization and crashes after a few iterations of training. I'm using tf version 1.12.0.
The error I'm getting is F tensorflow/core/common_runtime/bfc_allocator.cc:458] Check failed: c->in_use() && (c->bin_num == kInvalidBinNum) Aborted

This exception might be related to this issue #22581 https://github.com/tensorflow/tensorflow/issues/22581

I had the same issue and was able to solve this pulling the most recent tf-nightly-gpu image (with v1.13.0). See the comments here.

i got the same issue, and my tf-version=1.12.0, have someone kown about this

@YananJian Have u found a solution? I met the same problem, I need your help, thanks, pls.

@zhouyuangan tf==1.9.0 will be ok!

@YananJian Have u found a solution? I met the same problem, I need your help, thanks, pls.

Yes, tf==1.10.0 works.

Got the same issue while using tensorflow/serving:latest-gpu . Used the latest one and tested with three streams and found this problem is solved in tensorflow/serving:1.13-gpu

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dylanrandle picture dylanrandle  路  3Comments

brianschardt picture brianschardt  路  3Comments

OmriShiv picture OmriShiv  路  3Comments

sskgit picture sskgit  路  4Comments

vikeshkhanna picture vikeshkhanna  路  3Comments