Trying to verify Jamie's fix #10628 for Mac Unicode troubles I needed to build Drake on a Mac. We have a not-too-old Mac in a drawer here for that purpose. It had been used last year successfully to build Drake so had a git clone already. I updated that clone from upstream, pulled Jamie's pr, ran install_prereqs which complained about the Mac OS version. So I updated to Mohave and retried. Got many more complaints, including failure to compile some of the externs due to "old style casts". After numerous missteps I got through install_prereqs. Switched to fresh master, tried bazel build //.... Lots of failures ... long story (two days) ... but eventually I noticed that the externals it was complaining about were in /private/var/tmp/_bazel_sherm. Although I had tried nuking all the drake/bazel-* directories I didn't know about that one. So I deleted it, and all the bazel directories, and did bazel clean --expunge and then did install_prereqs, followed by build.
Other than lots of warnings from ranlib about libraries and object files with "no symbols" it seems to be building now.
So I _think_ my problems had to do with old cruft being around that the build system couldn't figure out to get rid of. It would be a great help if the build system could defend itself against such cruft -- the errors that result are inscrutable, at least to a Mac novice like me.
drake/bazel-* are symlinks, so nuking those was a no-op. For the rest, bazel clean --expunge should really have been sufficient (and probably necessary after the OS and/or Xcode update). We can't really advise people to delete /private/var/tmp/_bazel_<user> without the risk deleting builds of other unrelated bazel projects.
Maybe just modify the install-from-source-on-Mac instructions to mention bazel clean --expunge? I don't know whether the tmp/_bazel_sherm removal was required since I did the expunge at the same time.
I'm embarassed that I didn't realize that rm -rf bazel-* was a no-op! Sure felt good to do it :)
I would actually suspect the problem was less about the macOS upgrade, and more that the immediately prior build was done using an ancient version of Bazel from a year ago. There was a particular transition of Bazel outputBase layout nine(?) months ago where newer bazel would hork if the outputBase was from too many versions prior, without having been upgrading + rebuilding regularly in the meantime. So the advice would be "If you ever upgrade Bazel to skip ahead more than four minor versions at once, you should bazel clean --expunge to clear your cache". Or perhaps more generally "If you haven't compiled Drake in more than six months, do a bazel clean --expunge first.
I think it was probably both. Either are going to cause toolchain issues on Mac.
TIL "hork" !! Thanks, Jeremy.
Just to follow-up, is this actionable (e.g. adding a tip in the docs), or closable as-is?
Yes, needs a doc update.
@sherm1 Maybe you can post this on StackOverflow as a kind of knowledge base? It doesn't seem to happen often enough to warrant being part of the mainline documentation. (Or, just close it.)
I wouldn't know how to characterize the problem in a way that a user would be likely to find the solution via stackoverflow search. If the problem occurs again we will hopefully get a user stackoverflow question describing what they encountered. I think that would make a better entry in the knowledge base. Closing.
Most helpful comment
I think it was probably both. Either are going to cause toolchain issues on Mac.