Drake: geometry_test:test_unimplemented_rendering fails sporadically on Mac

Created on 19 Feb 2021  路  18Comments  路  Source: RobotLocomotion/drake

Calling what should be a pure virtual function(*) isn't raising an exception (sometimes). This issue seems very similar to #11424

On a local Catalina machine, I'm able to reproduce the issue with the following command:

bazel test --test_summary=short --runs_per_test=100 //bindings/pydrake:py/geometry_render_engine_subclass_test --test_arg=TestRenderEngineSubclass.test_unimplemented_rendering

EDIT(eric): Updated for #14724

It fails about 2% of the time.

Slack discussions:
https://drakedevelopers.slack.com/archives/C270MN28G/p1613504306108500
https://drakedevelopers.slack.com/archives/C270MN28G/p1612967089031600

(*) Although we're calling PYBIND11_OVERLOAD_PURE here the C++ function is not actually a pure virtual function, but the desired behavior is that it should throw an exception if not implemented. See #14300.

geometry general pydrake kitware bug

All 18 comments

Related to #14720

The issue discussed in #11424 has been fixed in upstream pybind11 with a different solution, see Pybind PR#2564.

I tried cherry-picking this fix onto Drake's pybind11 branch, but it resulted in a lot of errors.

Next, I tried creating a reproducer similar to Eric's mentioned here but was unable to reproduce on the current pybind11 master. This could be because the issue has been fixed or is difficult to reproduce - not conclusive one way or the other.

I think the next step should be to update Drake's pybind11 fork to latest commit (see #14225) and see if the issue is still reproducible on Mac.

FYI I'm briefly trying (again) to upgrade upstream.

Not brief, but pretty much close to done. Can you try to re-create the failure once #14225 lands? (or perhaps using that?)

On macsim, using #14225, I don't see the 2% failure rate.

$ git log -n 1 --oneline --no-decorate
608f682b56 workspace: Update pybind11 fork to latest commit
$ bazel test --test_summary=short --runs_per_test=100 //bindings/pydrake:py/geometry_render_engine_subclass_test --test_arg=TestRenderEngineSubclass.test_unimplemented_rendering
...
//bindings/pydrake:py/geometry_render_engine_subclass_test               PASSED in 2.9s
  Stats over 100 runs: max = 2.9s, min = 2.4s, avg = 2.7s, dev = 0.1s

Er, but even without that (i.e., on master), it still doesn't fail, even trying 1000 runs:

$ git log -n 1 --oneline --no-decorate
3d3dad3e7f [FEM] Add mesh utilities to support FEM demo (#14734)
...
//bindings/pydrake:py/geometry_render_engine_subclass_test               PASSED in 3.1s
  Stats over 1000 runs: max = 3.1s, min = 2.4s, avg = 2.7s, dev = 0.1s

Given previous incantations of this we've seen, most likely it's because the test got moved, or b/c it was flaky. Will revert #14724 (5a7203e) and try again.

$ git log -n 1 --oneline --no-decorate
3d3dad3e7f [FEM] Add mesh utilities to support FEM demo (#14734)
$ git revert --no-commit 5a7203e
$ bazel test --test_summary=short --runs_per_test=100 --flaky_test_attempts=1 \
    //bindings/pydrake:py/geometry_test \
    --test_arg=TestGeometry.test_unimplemented_rendering
...
//bindings/pydrake:py/geometry_test                                      PASSED in 3.2s
  Stats over 100 runs: max = 3.2s, min = 2.4s, avg = 2.8s, dev = 0.2s

Guess it's just a machine thing?

@BetsyMcPhail Yeah, def. need you to try this out on the machine you have.

Thanks for testing Eric. I somehow managed to break (fix?) my machine....

I started with latest master (includes #14724). Test passed. Not totally surprising since we've seen before how rearranging any related test code may make a difference.

% git log -n 1 --oneline --no-decorate
85deed3689 Add CLP to drake build. (#14729)
% bazel test --test_summary=short --runs_per_test=100 //bindings/pydrake:py/geometry_render_engine_subclass_test --test_arg=TestRenderEngineSubclass.test_unimplemented_rendering
...
//bindings/pydrake:py/geometry_render_engine_subclass_test               PASSED in 3.9s
  Stats over 100 runs: max = 3.9s, min = 2.4s, avg = 3.5s, dev = 0.3s

INFO: Build completed successfully, 104 total actions

Then I cherry-picked the commit from #14225, test still passes.

% git cherry-pick 608f682b56                                                        
[master c6e0de89fc] workspace: Update pybind11 fork to latest commit
 ...
% bazel test --test_summary=short --runs_per_test=100 //bindings/pydrake:py/geometry_render_engine_subclass_test --test_arg=TestRenderEngineSubclass.test_unimplemented_rendering
....
//bindings/pydrake:py/geometry_render_engine_subclass_test               PASSED in 5.3s
  Stats over 100 runs: max = 5.3s, min = 2.3s, avg = 3.7s, dev = 0.5s

INFO: Build completed successfully, 134 total actions

Finally, I went back to 8f12522 - before #14225 and #14724, the SHA I was using previously. This required running the setup script to go back to Python 3.8.

Test _still_ passes...

% git checkout 8f12522
...
% setup/mac/install_prereqs.sh  
...
% bazel test --test_summary=short --runs_per_test=100 \
    //bindings/pydrake:py/geometry_test \
    --test_arg=TestGeometry.test_unimplemented_rendering

...
//bindings/pydrake:py/geometry_test                                      PASSED in 7.2s
  Stats over 100 runs: max = 7.2s, min = 3.0s, avg = 5.1s, dev = 0.7s

INFO: Build completed successfully, 101 total actions

Gotcha; I guess it's no longer locally reproducible; my vote is, once #14225 lands, we proactively close this out.

We re-open it if it becomes a problem.

Gotcha; I guess it's no longer locally reproducible; my vote is, once #14225 lands, we proactively close this out.

We re-open it if it becomes a problem.

Sounds good to me

Filed #14748

Revert the revert, I think.

Hey all! Recently, #14855 removed a lot of the C++ being bound by this flaky test. Subsequent to that, I realized there was some documentation on this test that still referenced the deprecation state. As I cleaned it up, I hypothesized that it's no longer flaky. So, I've corrected the text and removed the flaky tag in #14870. It passed in one mac build (which is nothing when confronted with the "flaky" tag).

So, my question to you all is this: how confident do we have to be that the completion of the deprecation is sufficient to kill the flakiness before just submitting a PR to pull the test back in to where it belongs?

EDIT(eric):
https://github.com/RobotLocomotion/drake/blob/9a62c61e6691192f06634f57347e230e5461c354/bindings/pydrake/test/geometry_render_engine_subclass_test.py#L1-L24

Sweet! (And great catch!)
I'm like 95% confident? Options I'm cool with:

  1. Keep it as-is. We just close the issue, and re-open if someone ends up actually caring about this?
  2. Remove flaky tag on the split-out test.
  3. Revert the revert of the revert. Then wait to see if we need to revert the revert of the revert of the revert :grimacing:

Well, the most logical thing to do is to fix the documentation, update the note indicating that it may be possible to move it and then to move it in its own PR. That way, if it needs to be reverted, nothing that deals with correctness gets undone.

I've updated #14870 to just fix the documentation. I'll apply a follow up PR (which can easily be independently reverted) to move the test back into geometry_py.cc.

So, my question to you all is this: how confident do we have to be that the completion of the deprecation is sufficient to kill the flakiness before just submitting a PR to pull the test back in to where it belongs?

0% confidence that the problem is fixed, for the record, but if it is an easy revert I really don't mind too much.

Was this page helpful?
0 / 5 - 0 ratings