Drake: Model Storage Solution - Separate Repositories that are Pulled in as Submodules

Created on 25 Aug 2016  路  22Comments  路  Source: RobotLocomotion/drake

_The following is a proposal for a major change in how Drake stores models that are of general use. Its purpose is to gauge the level of buy-in from the greater community of Drake users and developers. Please view this as a solicitation for comments._

Problem Definition

A long-term problem we've continuously faced is how to properly store models (e.g., URDF, SDF, texture images, mesh files, and other associated files like model.config). Currently, they are mostly stored in drake-distro/drake/examples/, though some are also in drake-distro/drake/systems/. Even worse, some unit tests in drake-distro/drake/systems/ refer to models in drake-distro/drake/examples/.

Another problem is the fact that models bloat Drake's main repository, especially if they contain large textures and meshes. Github limits a repository to be 1GB. Thus, adding models into Drake's repository will result in this limit being hit sooner.

Finally, models are resources that may be used by processes that are outside of Drake. For example, other ROS nodes or Gazebo may need to use the models as well. Thus, placing them within Drake incurs needless burden for those who only want to use the models.

Proposed Solution

Let's move models that may be of _general use_ into separate repositories. These repositories can then be pulled into drake-distro/externals/ as git submodules. I'm using the plural form of "repositories" since I believe a single repository will not be enough given the 1GB limit.

Options for locating these models from within Drake's code include providing it as a command line input parameter, via an environment variable, or some other fancier search-based method. We could also follow Gazebo's lead and simply download them directly from a web server.

Note that models only used by a particular unit test or example can still remain in the main Drake repository. The relocation of models in a different repository only pertains to those of _general use_.

Previous Proposals

cleanup feature request question

Most helpful comment

This seems correct to me for general-use models (tables, chairs, grasping targets, etc.) and probably also correct for models with project-wide applicability (specific arms, hands, etc.) that are used for both complex module tests and for applications.

Obviously drake's dependency on such an external module or modules would have to be versioned, as tests might be quite brittle with respect to models.

All 22 comments

This seems correct to me for general-use models (tables, chairs, grasping targets, etc.) and probably also correct for models with project-wide applicability (specific arms, hands, etc.) that are used for both complex module tests and for applications.

Obviously drake's dependency on such an external module or modules would have to be versioned, as tests might be quite brittle with respect to models.

Using Git LFS?

@david-german-tri can you remind us why we decided not to use Git LFS earlier this year?

When we tested GitHub LFS, we found that it was extremely flaky. Large uploads or downloads would usually hang for no apparent reason.

Interesting. We used it on another project and it worked pretty well. Did you speak to GitHub about it?

No, we just gave up and moved on (it was a bit of a tangent for the project we were working on at the time).

https://github.com/RobotLocomotion/drake/issues/1471 is related in terms of developing an acceptable way to _find_ model files from within Drake's source code.

https://github.com/RobotLocomotion/drake/issues/2174 is related in terms of explaining why the current approach to locating files is sub-optimal.

This seems correct to me for general-use models (tables, chairs, grasping targets, etc.) and probably also correct for models with project-wide applicability (specific arms, hands, etc.) that are used for both complex module tests and for applications.

This would also make reusing models between drake and OpenHumanoids somewhat easier (for example, I recently wanted the robotiq hand model from https://github.com/openhumanoids/oh-distro/tree/master/software/models/common_components and we might want their schunk model if it's close enough to the model we're using...)

/cc @mitiguy: possibly related to the upcoming test/example suite.

I am leaning towards trying Git LFS again just to see if the QoS issues still exist. Where should we store the model files?

@jwnimmer-tri has rejected placing them in drake-distro/models/.

One option would be to place them in drake-distro/externals/models/. My only concern is it's a bit strange that everything in that directory would be git submodules _except_ for the the models directory. Thoughts?

FTR, here is the billing plan for git LFS: https://help.github.com/articles/billing-plans-for-git-large-file-storage/

drake-distro/models?

@sammy-tri: @jwnimmer-tri was opposed to placing the models directory at the super-build level. @jwnimmer-tri can you clarify?

Either the models are part of drake, or they are not. If they are park of drake, they to under drake-distro/drake; if they are not, they go elsewhere. From my 30 second read about git LFS, it seems like the concept is as-if they were in tree, which means they go somewhere within drake-distro/drake.

In any case, isn't a git lfs evaluation the actual first task? If it doesn't work, do we even have to decide where to put git lfs data within the tree?

I misunderstood something when I posted my previous comment. I don't think I have anything to add at this time.

I see. The models I'm referring to should _not_ be part of Drake since they are of _general use_. Thus, git LFS will not be of use since it would classify the models as in tree and thus part of Drake. Looks like we're back to the original proposal of pulling in models via git sub-modules. Thoughts?

A submodule can use Git LFS.

Good idea. Here's my latest proposal:

Create two new repositories:

  1. drake_models - a public repository
  2. drake_proprietary_models - a private repository

Add both repositories into Drake's super-build as submodules. Add CMake option WITH_PROPRIETARY_MODELS that only includes the drake_proprietary_models submodule when set to TRUE (it'll be by default FALSE). To store more than 1GB in the two models directory, they can employ Git LFS and have other Git submodules.

Gazebo experienced this problem, and moved all our models to a bitbucket repository (https://bitbucket.org/osrf/gazebo_models). We clone that repository on a webserver (http://models.gazebosim.org).

There are a few issues with approach:

  1. Model visibility is limited. It is difficult to see and understand what models exist, their versions, and features.
  2. Adding and modifying models is suitable only to experienced developers. It would be great to crowd source the development of models without requiring knowledge of a CVS.
  3. There is a dependence on storage limits of github or bitbucket.
  4. Can be difficult to integrate with other applications (especially web applications).

We have been working on a solution that could be useful to Drake. The solution involves a custom web server that hosts models and all their resources. A public REST API would support access to these models. A browser front-end will address model visibility and address item 2 on the list above.

Models and their resources are versioned, and it would be possible to prefetch all (or some of) the models. We are also supporting private and public models (ie : TRI could have a dedicated set of models that only TRI employees could access).

This solution is in the works, but it won't be ready for a number months. A near term solution such as hosting models in git repositories is probably good idea.

If this solution sounds interesting and of use to Drake, then I'd like to get feature requests, ideas, and comments. We could do this on slack or via email.

I guess you could always use CMake external data functionality with Amazon S3 as a backend. The main repo then just contains placeholders with CMake pulling down the actual model from S3 when it is needed. That could be up and running very quickly.

https://cmake.org/cmake/help/v3.5/module/ExternalData.html

(S3 buckets can be versioned and have fine grained permissions)

Note that in #4634, @RussTedrake created a new directory called drake-distro/drake/multibody/models/ for storing commonly used models.

Note that while this addresses the file-duplication issue, it does not resolve the issue of this git repository becoming to big due to the prevalence of files like meshes, graphics, etc. To solve that, we're going to need something like a cloud-based model server, or git submodules, etc.

I'm going to close this issue because it's describing an anticipated future problem, not a current pressing problem. We can reopen it when it becomes a current problem.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jwnimmer-tri picture jwnimmer-tri  路  4Comments

david-german-tri picture david-german-tri  路  4Comments

jamiesnape picture jamiesnape  路  5Comments

mattcorsaro1 picture mattcorsaro1  路  3Comments

amcastro-tri picture amcastro-tri  路  4Comments