So that developers can easily test new pages that require content from the database, improve the development workflow.
The main problem that needs to be addressed is the fact that when developing new features on the CMS that require content generated, it's difficult to get set up and keep track of where everything lives.
@ccostino and I figured out that a straightforward solution would be to have a database dump live in the repo (or a subset if it gets large enough). Any time there is database work, the dump would then be updated to reflect those changes.
Additionally, the dump would also be periodically updated as content gets added in.
We'll also write up a guide to document a proposed workflow for folks to follow and put it in the README or the wiki of the repo.
To add a bit of clarification to this, we do have some instructions for retrieving existing data found in our README. However, this still requires getting one's self setup with the app, running migrations, then getting the data and loading it; a lot of hoops to jump through.
I worked through the cloud.gov database export steps and was able to pull a full backup pretty easily. The problem with this is that the backup needs to be scrubbed of data in a few tables, namely django_session, anything starting with auth_*, django_content_type, and probably django_admin_log. Running the TRUNCATE command on these tables should be enough to do the trick. With that done, we should have a usable backup to build off of that contains all current migrations applied and any content produced.
One other thing to consider going down this route though is that the backup is for PostgreSQL. For local development we're using SQLite. Personally I don't see any issue with switching, but if we're going to do so then I propose we build a couple of tools for ourselves and others here much like we have in the API. This means the following:
invoke tasks to account for managing the backup (similar to the tasks we have in the API).This adds a bit of work to this story, but shouldn't take much more time (and we already have work to draw from in the API). If we do these things, we should be able to cut the number of commands down to just a couple that any given developer would have to run on a regular basis. The need for a full refresh of the backup shouldn't be as great or as often, either.
@xtine, what do you think?
This sounds like a great plan. I'm interested in @LindsayYoung 's suggestion about using a private S3 bucket (if it doesn't add complexity to the setup).
I agree with @ccostino and @LindsayYoung's additional points. I really like the idea of changing local to postgres to get the developments synced up. If the commands are cut down to just a couple, then it could truly be a one-liner update by putting them into a script.
Oh good call, @LindsayYoung! I'm going to tag @jcscottiii here too because he's working on additional support for this very thing in cloud.gov. :-)
Small update on what needs to be cleared out in the database; only the auth_user and auth_user_groups tables contain info that would need to be removed it looks like as far as all of the auth_* tables are concerned. In thinking a bit more too, perhaps django_content_type doesn't need to be cleared out. That only becomes an issue when you're using the manage.py dumpdata and loaddata commands. Since we're re-using the database itself though, these should be okay as is.
And of course, anything that references a user via user_id and/or the auth_user table is going to pose an issue...
We've been talking about splitting this work up into discrete tasks, here is how I would break it apart:
invoke tasks for pulling and managing the backups (work with @jcscottiii on integrating a cloud.gov utility he is building for this very thing)馃憤 @ccostino: let me know if you need any help on these items!
Closing this issue now that we have smaller tasks. Thanks!