tfjs-automl/demo/object_detection gives no predictions

Created on 27 Aug 2020  ยท  34Comments  ยท  Source: tensorflow/tfjs

TensorFlow.js version

Not sure... is that the same as the version of tfjs-core that is used?
If I check the tfjs-automl package.json, it looks like it is using tfjs-core 1.2.8, but that seems strange as tfjs is at 2.x right?
Then again if I check the commits it looks like tfjs-automl was never upgraded to 2.x?

Browser version

Does not seem relevant. Latest Chrome.

Describe the problem or feature request

Predictions always return an empty array.

The reason I cloned this repo and tried the object_detection demo is that this was exactly what I was seeing on my own model... Simply no predictions are returned, just an empty array. So I tried to replace my model with a 'known good' one from Google itself, but still the same result. So, thinking I might have some error in my code somewhere, I decided to try the official example in this repo, giving the same result. Empty array...

Code to reproduce the bug / link to feature request

Clone this repo (I am assuming you cloned to C:\ws\tfjs)
In a shell window, browse to C:\ws\tfjs\tfjs-automl\demo\object_detection
yarn
C:\ws\tfjs\tfjs-automl\demo\object_detection>yarn
yarn install v1.12.3
[1/4] Resolving packages...
[2/4] Fetching packages...
info [email protected]: The platform "win32" is incompatible with this module.
info "[email protected]" is an optional dependency and failed compatibility check. Excluding it from installation.
[3/4] Linking dependencies...
[4/4] Building fresh packages...
Done in 34.04s.
yarn watch
C:\ws\tfjs\tfjs-automl\demo\object_detection>yarn watch
yarn run v1.12.3
$ cross-env NODE_ENV=development parcel index.html --no-hmr --open
Server running at http://localhost:1234
| Building index.html...Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`

WARNING: We noticed you're using the `useBuiltIns` option without declaring a core-js version. Currently, we assume version 2.x when no version is passed. Since this default version will likely change in future versions of Babel, we recommend explicitly setting the core-js version you are using via the `corejs` option.

You should also be sure that the version you pass to the `corejs` option matches the version specified in your `package.json`'s `dependencies` section. If it doesn't, you need to run one of the following commands:

  npm install --save core-js@2    npm install --save core-js@3
  yarn add core-js@2              yarn add core-js@3

/ Building index.js...Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`

WARNING: We noticed you're using the `useBuiltIns` option without declaring a core-js version. Currently, we assume version 2.x when no version is passed. Since this default version will likely change in future versions of Babel, we recommend explicitly setting the core-js version you are using via the `corejs` option.

You should also be sure that the version you pass to the `corejs` option matches the version specified in your `package.json`'s `dependencies` section. If it doesn't, you need to run one of the following commands:

  npm install --save core-js@2    npm install --save core-js@3
  yarn add core-js@2              yarn add core-js@3

/ Building index.js...Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`

WARNING: We noticed you're using the `useBuiltIns` option without declaring a core-js version. Currently, we assume version 2.x when no version is passed. Since this default version will likely change in future versions of Babel, we recommend explicitly setting the core-js version you are using via the `corejs` option.

You should also be sure that the version you pass to the `corejs` option matches the version specified in your `package.json`'s `dependencies` section. If it doesn't, you need to run one of the following commands:

  npm install --save core-js@2    npm install --save core-js@3
  yarn add core-js@2              yarn add core-js@3

โˆš  Built in 13.89s.
Open your browser

Open your browser at http://localhost:1234

Empty array is returned

After you open the page, it takes a few seconds in which inference is running. After that, it should print a JSON with the returned results below the image and draw some boxes on top of the image containing the detected objects, but instead it only prints [] and draws no boxes at all.

autoML bug

All 34 comments

@rthadur I wasn't able to reproduce this (i get the expected array and boxes on the screen), are you able to reproduce? The demo depends on already published versions of tfjs and tfjs-automl so the behaviour described is surprising.

@tafsiri On what OS do you test?
I run the example on Windows.
And what steps did you follow? Same as me, just yarn and yarn watch?

Are there any ways for me to debug what is happening? Can I enable logging somehow? Or insert some code to check stuff?

Just spinned up a Ubuntu 18 VM. Same result.

stijn@DESKTOP-3I7O7CL:~$ git clone [email protected]:tensorflow/tfjs.git
Cloning into 'tfjs'...
Enter passphrase for key '/home/stijn/.ssh/id_rsa':
remote: Enumerating objects: 268, done.
remote: Counting objects: 100% (268/268), done.
remote: Compressing objects: 100% (156/156), done.
remote: Total 49031 (delta 140), reused 194 (delta 106), pack-reused 48763
Receiving objects: 100% (49031/49031), 49.53 MiB | 3.54 MiB/s, done.
Resolving deltas: 100% (38881/38881), done.
Checking out files: 100% (2354/2354), done.
stijn@DESKTOP-3I7O7CL:~$ cd tfjs
stijn@DESKTOP-3I7O7CL:~/tfjs$ cd tfjs-automl/
stijn@DESKTOP-3I7O7CL:~/tfjs/tfjs-automl$ cd demo/
stijn@DESKTOP-3I7O7CL:~/tfjs/tfjs-automl/demo$ cd object_detection/
stijn@DESKTOP-3I7O7CL:~/tfjs/tfjs-automl/demo/object_detection$ yarn
yarn install v1.22.4
[1/4] Resolving packages...
[2/4] Fetching packages...
info [email protected]: The platform "linux" is incompatible with this module.
info "[email protected]" is an optional dependency and failed compatibility check. Excluding it from installation.
[3/4] Linking dependencies...
[4/4] Building fresh packages...
warning Your current version of Yarn is out of date. The latest version is "1.22.5", while you're on "1.22.4".
info To upgrade, run the following command:
$ curl --compressed -o- -L https://yarnpkg.com/install.sh | bash
Done in 28.26s.
stijn@DESKTOP-3I7O7CL:~/tfjs/tfjs-automl/demo/object_detection$ yarn watch
yarn run v1.22.4
$ cross-env NODE_ENV=development parcel index.html --no-hmr --open
Server running at http://localhost:1234
โ ธ Building index.html...Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`

WARNING: We noticed you're using the `useBuiltIns` option without declaring a core-js version. Currently, we assume version 2.x when no version is passed. Since this default version will likely change in future versions of Babel, we recommend explicitly setting the core-js version you are using via the `corejs` option.

You should also be sure that the version you pass to the `corejs` option matches the version specified in your `package.json`'s `dependencies` section. If it doesn't, you need to run one of the following commands:

  npm install --save core-js@2    npm install --save core-js@3
  yarn add core-js@2              yarn add core-js@3

โ ฆ Building tf-automl.esm.js...Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`

WARNING: We noticed you're using the `useBuiltIns` option without declaring a core-js version. Currently, we assume version 2.x when no version is passed. Since this default version will likely change in future versions of Babel, we recommend explicitly setting the core-js version you are using via the `corejs` option.

You should also be sure that the version you pass to the `corejs` option matches the version specified in your `package.json`'s `dependencies` section. If it doesn't, you need to run one of the following commands:

  npm install --save core-js@2    npm install --save core-js@3
  yarn add core-js@2              yarn add core-js@3

Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`

WARNING: We noticed you're using the `useBuiltIns` option without declaring a core-js version. Currently, we assume version 2.x when no version is passed. Since this default version will likely change in future versions of Babel, we recommend explicitly setting the core-js version you are using via the `corejs` option.

You should also be sure that the version you pass to the `corejs` option matches the version specified in your `package.json`'s `dependencies` section. If it doesn't, you need to run one of the following commands:

  npm install --save core-js@2    npm install --save core-js@3
  yarn add core-js@2              yarn add core-js@3

โœจ  Built in 13.60s.

Results:

On the webpage, below the image, after a few seconds, this appears:

[]

@Download I'm on mac os. Debugging apis you could use include https://js.tensorflow.org/api/latest/#enableDebugMode and https://js.tensorflow.org/api/latest/#profile, these would let you see what kernels get run and how many tensors are created.

I'd personally also look into debugging the input into the model to making sure it is as expected. I'd also try things like passing in a random tensor using the advanced api.

cc @dsmilkov for other debugging thoughts.

@tafsiri To enable debug mode, would I add

tf.enableDebugMode ()

to tfjs-automl/demo/object_detection/index.js

?

and am I correct that I would need to add an import statement to get tf?

Sorry if this sounds dumb, but could you spell out what code I should add where?

yes on both counts (the import and the function call). You would also need to npm install @tensorflow/tfjs.

Though if we can reproduce we'll be in a better position to take a look. Adding a few other folk to this.

@tafsiri
Thanks. I will try this and let you know the output. Also, I am goingto try it on a different machine (Windows laptop) and see what happens there.

One question still. Did you do the exact same steps as I did? So:

  • Git clone this repo
  • cd into automl/demo/object_detection
  • yarn
  • yarn watch

Because I cannot understand how this can give different results??

Attempting to reproduce on my Windows laptop

C:\ws>git clone https://github.com/tensorflow/tfjs.git
Cloning into 'tfjs'...
remote: Enumerating objects: 284, done.
remote: Counting objects: 100% (284/284), done.
remote: Compressing objects: 100% (166/166), done.
remote: Total 49047 (delta 147), reused 203 (delta 111), pack-reused 48763
Receiving objects: 100% (49047/49047), 49.54 MiB | 3.54 MiB/s, done.
Resolving deltas: 100% (38888/38888), done.

C:\ws>cd tfjs\tfjs-automl\demo\object_detection

C:\ws\tfjs\tfjs-automl\demo\object_detection>yarn
yarn install v1.22.5
[1/4] Resolving packages...
[2/4] Fetching packages...
info [email protected]: The platform "win32" is incompatible with this module.
info "[email protected]" is an optional dependency and failed compatibility check. Excluding it from installation.
[3/4] Linking dependencies...
[4/4] Building fresh packages...
Done in 74.76s.

C:\ws\tfjs\tfjs-automl\demo\object_detection>yarn watch
yarn run v1.22.5
$ cross-env NODE_ENV=development parcel index.html --no-hmr --open
Server running at http://localhost:1234
| Building index.html...Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`

WARNING: We noticed you're using the `useBuiltIns` option without declaring a core-js version. Currently, we assume version 2.x when no version is passed. Since this default version will likely change in future versions of Babel, we recommend explicitly setting the core-js version you are using via the `corejs` option.

You should also be sure that the version you pass to the `corejs` option matches the version specified in your `package.json`'s `dependencies` section. If it doesn't, you need to run one of the following commands:

  npm install --save core-js@2    npm install --save core-js@3
  yarn add core-js@2              yarn add core-js@3

- Building index.js...Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`

WARNING: We noticed you're using the `useBuiltIns` option without declaring a core-js version. Currently, we assume version 2.x when no version is passed. Since this default version will likely change in future versions of Babel, we recommend explicitly setting the core-js version you are using via the `corejs` option.

You should also be sure that the version you pass to the `corejs` option matches the version specified in your `package.json`'s `dependencies` section. If it doesn't, you need to run one of the following commands:

  npm install --save core-js@2    npm install --save core-js@3
  yarn add core-js@2              yarn add core-js@3

/ Building index.js...Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`

WARNING: We noticed you're using the `useBuiltIns` option without declaring a core-js version. Currently, we assume version 2.x when no version is passed. Since this default version will likely change in future versions of Babel, we recommend explicitly setting the core-js version you are using via the `corejs` option.

You should also be sure that the version you pass to the `corejs` option matches the version specified in your `package.json`'s `dependencies` section. If it doesn't, you need to run one of the following commands:

  npm install --save core-js@2    npm install --save core-js@3
  yarn add core-js@2              yarn add core-js@3

โˆš  Built in 20.58s.

Results

Also here it returns an empty array...

automl-object-detection-results

Conclusion

I have tried this now on 3 different (virtual) machines:

  • My Windows desktop
  • An Ubuntu 18 VM running on my Windows desktop
  • My Windows laptop

Also I tried now with 2 different versions of Chrome (the one on my laptop was outdated and I did not yet update it) and on Firefox. All give consistently the same result: an empty array.

Also, I noticed other people are also reporting issues about getting empty array as the result. See #3861
So my conclusion is that there really is something broken here.

@tafsiri
I am really wondering.... How do you reproduce?
Can you tell me the exact steps you followed?
Maybe I can try to do it your way.
Or did you do exactly the same?

  • Git clone
  • cd tfjs/tfjs-automl/demo/object_detection
  • yarn
  • yarn watch

It seems very hard to believe for me that it would give different results.
Only thing is I never tried on a Mac...
Do you have access to another (non-Mac) machine to eliminate the chance it is related to the OS?

I will add debug info now as per your instructions and see what comes up...

Running in debug mode

I tried a few different ways of running the demo with debug mode enabled.

With @tensorflow/tfjs latest

First, I installed @tensorflow/tfjs in the object_detection demo.

C:\ws\tfjs\tfjs-automl\demo\object_detection>yarn add @tensorflow/tfjs
yarn add v1.12.3
[1/4] Resolving packages...
[2/4] Fetching packages...
info [email protected]: The platform "win32" is incompatible with this module.
info "[email protected]" is an optional dependency and failed compatibility check. Excluding it from installation.
[3/4] Linking dependencies...
warning "@tensorflow/tfjs > @tensorflow/[email protected]" has unmet peer dependency "seedrandom@~2.4.3".
[4/4] Building fresh packages...
success Saved lockfile.
warning Your current version of Yarn is out of date. The latest version is "1.22.5", while you're on "1.12.3".
info To upgrade, run the following command:
$ curl --compressed -o- -L https://yarnpkg.com/install.sh | bash
success Saved 10 new dependencies.
info Direct dependencies
โ””โ”€ @tensorflow/[email protected]
info All dependencies
โ”œโ”€ @tensorflow/[email protected]
โ”œโ”€ @types/[email protected]
โ”œโ”€ @types/[email protected]
โ”œโ”€ @types/[email protected]
โ”œโ”€ [email protected]
โ”œโ”€ [email protected]
โ”œโ”€ [email protected]
โ”œโ”€ [email protected]
โ”œโ”€ [email protected]
โ””โ”€ [email protected]
Done in 104.31s.

Next, I added an import statement and the line to enable debug mode to tfjs-automl\demo\object_detection\index.js:

import * as tf from '@tensorflow/tfjs';          // <-- added this on (empty) line 17
import * as automl from '@tensorflow/tfjs-automl';
tf.enableDebugMode();                            // <-- added this on (empty) line 19
const MODEL_URL =
    'https://storage.googleapis.com/tfjs-testing/tfjs-automl/object_detection/model.json';

Then, I ran yarn watch again:

C:\ws\tfjs\tfjs-automl\demo\object_detection>yarn watch
yarn run v1.12.3
$ cross-env NODE_ENV=development parcel index.html --no-hmr --open
Server running at http://localhost:1234
โˆš  Built in 4.67s.

This time, no results are given. The app seems to crash.
There is some output in the developer tools console:

engine.ts:229 webgl backend was already registered. Reusing existing backend factory.
registerBackend @ engine.ts:229
engine.ts:229 cpu backend was already registered. Reusing existing backend factory.
registerBackend @ engine.ts:229
environment.ts:55 Platform browser has already been set. Overwriting the platform with [object Object].
setPlatform @ environment.ts:55
flags.ts:27 Debugging mode is ON. The output of every math call will be downloaded to CPU and checked for NaNs. This significantly impacts performance.
(anonymous) @ flags.ts:27
tensor.ts:464 Uncaught (in promise) TypeError: ut(...).registerTensor is not a function
    at new t (tensor.ts:464)
    at Function.t.make (tensor.ts:483)
    at wn (tensor_ops.ts:112)
    at bn (tensor_ops.ts:58)
    at o (io_utils.ts:175)
    at Object.eh [as decodeWeights] (io_utils.ts:116)
    at e.<anonymous> (graph_model.ts:143)
    at tensor.ts:397
    at Object.next (tensor.ts:397)
    at o (tensor.ts:397)

This seems strange.... But now I realize that maybe I install the wrong version of @tensorflow/tfjs?

With @tensorflow/tfjs 1.2.8

So I am trying again, this time trying with @tensorflow/[email protected] (same version as @tensorflow/tfjs-core that the demo was already using) and see if that helps.

I left the code changes described above the same, but uninstalled the latest version of tfjs and installed 1.2.8:

C:\ws\tfjs\tfjs-automl\demo\object_detection>yarn remove @tensorflow/tfjs
yarn remove v1.12.3
[1/2] Removing module @tensorflow/tfjs...
[2/2] Regenerating lockfile and installing missing dependencies...
info [email protected]: The platform "win32" is incompatible with this module.
info "[email protected]" is an optional dependency and failed compatibility check. Excluding it from installation.
success Uninstalled packages.
Done in 11.21s.

C:\ws\tfjs\tfjs-automl\demo\object_detection>yarn add @tensorflow/[email protected]
yarn add v1.12.3
[1/4] Resolving packages...
[2/4] Fetching packages...
info [email protected]: The platform "win32" is incompatible with this module.
info "[email protected]" is an optional dependency and failed compatibility check. Excluding it from installation.
[3/4] Linking dependencies...
warning "@tensorflow/tfjs > @tensorflow/[email protected]" has unmet peer dependency "seedrandom@~2.4.3".
[4/4] Building fresh packages...

success Saved lockfile.
success Saved 11 new dependencies.
info Direct dependencies
โ””โ”€ @tensorflow/[email protected]
info All dependencies
โ”œโ”€ @tensorflow/[email protected]
โ”œโ”€ @tensorflow/[email protected]
โ”œโ”€ @tensorflow/[email protected]
โ”œโ”€ @types/[email protected]
โ”œโ”€ @types/[email protected]
โ”œโ”€ [email protected]
โ”œโ”€ [email protected]
โ”œโ”€ [email protected]
โ”œโ”€ [email protected]
โ”œโ”€ [email protected]
โ””โ”€ [email protected]
Done in 17.11s.

C:\ws\tfjs\tfjs-automl\demo\object_detection>yarn watch
yarn run v1.12.3
$ cross-env NODE_ENV=development parcel index.html --no-hmr --open
Server running at http://localhost:1234
\ Building index.html...Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`

WARNING: We noticed you're using the `useBuiltIns` option without declaring a core-js version. Currently, we assume version 2.x when no version is passed. Since this default version will likely change in future versions of Babel, we recommend explicitly setting the core-js version you are using via the `corejs` option.

You should also be sure that the version you pass to the `corejs` option matches the version specified in your `package.json`'s `dependencies` section. If it doesn't, you need to run one of the following commands:

  npm install --save core-js@2    npm install --save core-js@3
  yarn add core-js@2              yarn add core-js@3

- Building tf-data.esm.js...Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`

WARNING: We noticed you're using the `useBuiltIns` option without declaring a core-js version. Currently, we assume version 2.x when no version is passed. Since this default version will likely change in future versions of Babel, we recommend explicitly setting the core-js version you are using via the `corejs` option.

You should also be sure that the version you pass to the `corejs` option matches the version specified in your `package.json`'s `dependencies` section. If it doesn't, you need to run one of the following commands:

  npm install --save core-js@2    npm install --save core-js@3
  yarn add core-js@2              yarn add core-js@3

Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`

WARNING: We noticed you're using the `useBuiltIns` option without declaring a core-js version. Currently, we assume version 2.x when no version is passed. Since this default version will likely change in future versions of Babel, we recommend explicitly setting the core-js version you are using via the `corejs` option.

You should also be sure that the version you pass to the `corejs` option matches the version specified in your `package.json`'s `dependencies` section. If it doesn't, you need to run one of the following commands:

  npm install --save core-js@2    npm install --save core-js@3
  yarn add core-js@2              yarn add core-js@3

โˆš  Built in 14.78s.

Again, the app crashes.
The logging printed to the developer console is slightly different:

flags.ts:27 Debugging mode is ON. The output of every math call will be downloaded to CPU and checked for NaNs. This significantly impacts performance.
(anonymous) @ flags.ts:27
t.set @ environment.ts:104
Fe @ globals.ts:49
parcelRequire.index.js.@tensorflow/tfjs @ index.js:19
newRequire @ object_detection.e31bb0bc.js:49
(anonymous) @ object_detection.e31bb0bc.js:81
(anonymous) @ object_detection.e31bb0bc.js:107
util.ts:107 Uncaught (in promise) Error: Element arr[0] should be a primitive, but is an array of 0 elements
    at f (util.ts:107)
    at t (tensor_util_env.ts:56)
    at t (tensor_util_env.ts:66)
    at en (tensor_util_env.ts:46)
    at bn (tensor_ops.ts:57)
    at o (io_utils.ts:175)
    at Object.eh [as decodeWeights] (io_utils.ts:116)
    at e.<anonymous> (graph_model.ts:143)
    at callbacks.ts:256
    at Object.next (callbacks.ts:256)
f @ util.ts:107
t @ tensor_util_env.ts:56
t @ tensor_util_env.ts:66
en @ tensor_util_env.ts:46
bn @ tensor_ops.ts:57
o @ io_utils.ts:175
eh @ io_utils.ts:116
(anonymous) @ graph_model.ts:143
(anonymous) @ callbacks.ts:256
(anonymous) @ callbacks.ts:256
o @ callbacks.ts:256
async function (async)
run @ index.js:24
parcelRequire.index.js.@tensorflow/tfjs @ index.js:71
newRequire @ object_detection.e31bb0bc.js:49
(anonymous) @ object_detection.e31bb0bc.js:81
(anonymous) @ object_detection.e31bb0bc.js:107

So I am running out of ideas here.

@tafsiri
Is there something else I can try?
And can you elaborate on how you tried to reproduce? Same steps?
Am I maybe doing something wrong in my attempts to debug?
Should I import tfjs after I import automl i.s.o. before?

EDIT

I figured out that actually, I don't have to install @tensorflow/tfjs after all, because enableDebugMode is exported from @tensorflow/tfjs-core, which was already installed. This allows me to not change the package.json. So I now tried with the original dependencies:

tfjs-automl/demo/object_detection/package.json

{
  "dependencies": {
    "@tensorflow/tfjs-automl": "^1.0.0",
    "@tensorflow/tfjs-converter": "^1.2.8",
    "@tensorflow/tfjs-core": "^1.2.8"
  }
}

I added the import for tfjs-core and called enableDebugMode. I also added some logging:

tfjs-automl/demo/object_detection/index.js

import { enableDebugMode } from '@tensorflow/tfjs-core';  // <-- import
enableDebugMode();                                        // <-- call
import * as automl from '@tensorflow/tfjs-automl';
const MODEL_URL =
    'https://storage.googleapis.com/tfjs-testing/tfjs-automl/object_detection/model.json';

async function run() {
  console.info('loading model');                          // <-- logging
  const model = await automl.loadObjectDetection(MODEL_URL);
  const image = document.getElementById('salad');
  // These are the default options.
  const options = {score: 0.5, iou: 0.5, topk: 20};
  console.info('running predictions');                    // <-- logging
  const predictions = await model.detect(image, options);
  console.info('predictions', predictions);               // <-- logging

Results

Nothing is returned. The program crashes with these messages in the console:

Debugging mode is ON. The output of every math call will be downloaded to CPU and checked for NaNs. This significantly impacts performance. flags.ts:27:12
loading model index.js:24:10
Uncaught (in promise) Error: Element arr[0] should be a primitive, but is an array of 0 elements
    f util.ts:107
    t tensor_util_env.ts:57
    t tensor_util_env.ts:66
    en tensor_util_env.ts:40
    bn tensor_ops.ts:57
    o io_utils.ts:175
    eh io_utils.ts:116
    load graph_model.ts:143
    p object_detection.e31bb0bc.js:19410
    p object_detection.e31bb0bc.js:19422
    o object_detection.e31bb0bc.js:19311

If I comment out the call to enableDebugMode, I get empty array again and these messages in the console:

loading model index.js:24:10
running predictions index.js:29:10
predictions 
Array []
index.js:31:10

Codepen demonstrating the issue

I took the sample code from this documentation page about object detection and put it in this codepen. Same result. Empty array.

Oh one thing: I replaced the url of the model to load with the one from the automl/demo/object_detection example because the script on that documentation page is trying to load from a local url which is not on codepen.

This zip file also demonstrates the issue on my machine. Here I have downloaded the model files to a folder with an index.html with the script from the documentation page.

object_detection.zip

Start an http-server on that folder as per the instructions on the page:

C:\ws\object_detection>http-server -p 8000
Starting up http-server, serving ./
Available on:
  http://192.168.2.12:8000
  http://127.0.0.1:8000
Hit CTRL-C to stop the server

@Download thanks for the codepen link. I tried it and still get working results.

Screen Shot 2020-08-29 at 8 51 30 PM

I did modiy your codepen a bit to switch the backend to CPU. https://codepen.io/tafsiri/pen/zYqzaKr

Could you try that and let us know if it works (it might take a bit longer to return an answer). If so then it is probably a WebGL bug of some sort.

Yes! It works!

@tafsiri Thank you! I have now seen working predictions on my machine for the first time. Finally I have a way forward. You really made my day buddy! I am going to implement using the CPU for now. Later on I might add some code that attempts to do predictions using the GPU and if it succeeds, switch the backend back to GPU for those devices where it works.

I have a consistently reproducing scenario now for this issue with the GPU backend, so if you want me to try out some stuff to narrow down the issue, just let me know. You can reach me at stijndewitt AT gmail DOT com.

Okay that narrows it down a bit. Could you screenshot what you see here https://js.tensorflow.org/debug/ and add it to this issue.

Also are you able to get us info on what graphics card/chipset you are running? I notice you mention you are using _virtual_ machines, if your setup prevents the VM from accessing the graphics card you will not be able to to use the WebGL backend. Have you tried this outside of a VM?

@tafsiri

Could you screenshot what you see here https://js.tensorflow.org/debug/ and add it to this issue.

image

Also are you able to get us info on what graphics card/chipset you are running?

According to Windows Device Manager it is "Intel(R) HD Graphics 4600"
That is my desktop. I can check my laptop as well if it is useful. And maybe you have/know some WebGL test page that I can screenshot for more details?

Have you tried this outside of a VM?
Yes, I tried on:

  • My Windows desktop, on the bare metal
  • My Windows laptop, on the bare metal
  • An Ubuntu 18 VM running on my Windows desktop

@rthadur Would you be able to try and reproduce this on windows (using the codepen link)?

@annxingyuan Any ideas of other things to check that would explain getting no results on WebGL but getting results on CPU?

@tafsiri Hmm, running the app in debug mode would be the best way to check, but it looks like it gets stuck on checking for shape consistency - I'm wondering whether the same error occurs in debug mode on the CPU backend? @Download - any chance you still have things set up to run in debug mode and could easily check whether you get the same shape consistency error on the CPU backend?

@annxingyuan i believe there is no error on cpu.

nm misunderstood what you meant. Will chat with you offline.

@tafsiri tried in windows on a loaner laptop , it works well with CPU and WebGL backend , i tried using this codepen example https://codepen.io/tafsiri/pen/zYqzaKr

@annxingyuan
If with the 'shape consistency error' you are talking about Element arr[0] should be a primitive, but is an array of 0 elements, I get that on the GPU backend as well as on the CPU backend. I updated this codepen so it shows that:

https://codepen.io/StijnDeWitt/pen/poywgJV

tf.setBackend('cpu')
tf.enableDebugMode()

Result

Uncaught (in promise) Error: Element arr[0] should be a primitive, but is an array of 0 elements
    at gv (tfjs:17)
    at t (tfjs:17)
    at t (tfjs:17)
    at Vg (tfjs:17)
    at Gy (tfjs:17)
    at mN (tfjs:17)
    at t.e.loadSync (tfjs:17)
    at t.<anonymous> (tfjs:17)
    at u (tfjs:17)
    at Generator._invoke (tfjs:17)
gv @ tfjs:17
t @ tfjs:17
t @ tfjs:17
Vg @ tfjs:17
Gy @ tfjs:17
mN @ tfjs:17
e.loadSync @ tfjs:17
(anonymous) @ tfjs:17
u @ tfjs:17
(anonymous) @ tfjs:17
forEach.t.<computed> @ tfjs:17
Wm @ tfjs:17
o @ tfjs:17
async function (async)
run @ index.html?key=iFrameKey-15e063db-a94a-c56c-9648-913b5caf6c6b:28
(anonymous) @ index.html?key=iFrameKey-15e063db-a94a-c56c-9648-913b5caf6c6b:38

I think the fact that enabling debug mode gives this error is an indication that it does not work completely correctly.

@rthadur When you add tf.enableDebugMode() to your codepen, does it still work?

@Download yes it worked!

@rthadur He he I'm not sure what you mean by 'it woked'... As in you get no error? Or as in yes you can reproduce now?

I have two machines here exhibiting the problem. Admittedly they are both old machines with integrated graphics...
Maybe you can try to add some extra debug log statements to the library just before the point where the error is thrown in my stacktrace above and expose that test version as a codepen or something and I can run it on one of those machines and let you know the results?

Or maybe you have some other idea to try and narrow this down?

We narrowed it down a little bit, there may be 2 bugs. To get around the first one could you move the call to tf.enableDebugMode(); to _after_ the model has loaded. This will avoid the shape consistency check issue. I've done this in my codepen so you can try running that.

We suspect somewhere in the pipeline, NaNs (or lots of zeroes) are being produced, debugMode will check operations for NaNs and report them on the console. Let us know what you see printed out. Thanks.

Also if for some reason you still get the shape check errors after moving enableDebugMode below the model load code, you can try adding tf.env().set('TENSORLIKE_CHECK_SHAPE_CONSISTENCY', false) as the first line of the program.

@tafsiri
I ran your codepen. It does not crash with the shape consistency error but still gives an empty array. It printed a looot of logging to the console, attached below:

tensorflow-issue-3858-logging.txt

cc @annxingyuan see linked profile above, no evidence of NaNs.

@Download one more suggestion. Could you add tf.ENV.set('WEBGL_PACK', false) to the top of the program (or run my codepen again as I've added that there) and also upload the logs from that.

Hi @tafsiri
Ran your codepen again. This time it gives predictions:

[
  {
    "box": {
      "left": -2.6272237300872803,
      "top": 7.801450788974762,
      "width": 309.0912103652954,
      "height": 276.35952830314636
    },
    "label": "Salad",
    "score": 0.9568929672241211
  },
  {
    "box": {
      "left": 104.73532229661942,
      "top": 25.768655352294445,
      "width": 73.24516028165817,
      "height": 52.02250275760889
    },
    "label": "Tomato",
    "score": 0.85658860206604
  }
]

I guess that's good news right?

tensorflow-issue-3858-logging-webgl-pack-false.txt

Nothing to do with the issue I guess, but I notice a negative value for the left field of the first box. Is that normal?

It is good news and I think does confirm that your hardware probably doesn't support WebGL as well as would be needed to execute this model under our default settings. Workarounds for those kinds of issues are quite difficult to do unless we can locally reproduce. Feel free to leave tf.env().set('WEBGL_PACK', false) in your code as you test with your actual use case (there may still be accuracy issues).

We did want to suggest trying out the WASM backend, it is generally much faster than the standard CPU backend (sometimes as fast as WebGL), and may be more consistent if you anticipate deploying to older hardware.

Thanks for your patience and sending along debug info.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 dyas if no further activity occurs. Thank you.

Closing as stale. Please @mention us if this needs more attention.

@tafsiri
I don't think this issue has actually been resolved right?
Are you guys planning on improving the default WebGL-based backend?
My project ended so I haven't been working on it anymore. So I did not test the WASM backend.
There are multiple reports about object detection giving no results and it eats up developer time when it happens. It took me 3 full days of investigation before I could resolve it. That's allmost $ 2,500 of wasted money for my client.

I guess it is actually worse if it does work on the development machine... because then you will end up deploying to production and a certain percentage of users will just get no results and the devs will be scratching their heads with no error message or any other leads to what is happening and no way to reproduce. In it's current state I would never use the WebGL engine in production for that reason. It should at least print some error message that the hardware is not up to par.

I have invested a lot of time in this issue. Running reproduction scenario's, providing debug logs and what not. And to then see the issue being closed as stale, even though the problem still exists, is a bit painful. I understand you need to get the issue off of your work list, but the only real way of doing that is investigating and solving it.

Oh, one more thing. I know this issue is hardware related, but both machines that I own have this issue and they are using standard Intel on-board graphics. I am betting that it's actually a significant percentage of users out there that use similar hardware. Since this stuff runs on the client, it will fail on the client machine. Without any error message. How comfortable would you be deploying a solution that will fail on x percent of user's machines without error messages or anything? Will those users end up calling your support desk? How much time will you end up spending on answering calls and investigations before eventually you realize that the only real fix is to switch to the CPU backend? If you, as Tensorflow developers, feel it is too hard to fix this issue, imagine how much harder it is for developers that know nothing about Tensorflow? It is nigh on impossible.

In my mind, having this bug in it makes the WebGL backend worthless because I cannot use it in production. So it seems to me, for that reason, it is worthwhile to fix.

@Download Hello, yes you are right - this issue should not be closed. Our issues get automatically closed after a few days so thank you for the nudge. I also sincerely apologize for the experience you had. We will do our best to resolve this issue.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

KienPM picture KienPM  ยท  3Comments

lastnod picture lastnod  ยท  3Comments

beele picture beele  ยท  3Comments

rlexa picture rlexa  ยท  3Comments

Umar24129 picture Umar24129  ยท  3Comments