Postgraphile: Organizing functions—how?

Created on 15 Nov 2016  Â·  33Comments  Â·  Source: graphile/postgraphile

As my feature-set expands, I'm certain to have many more functions / triggers etc. I'm using pg-migrator right now, but I can see this quickly getting out of hand. Partial functions across several migrations, for example.

How can a larger project keep the functions organized and share work with multiple developers?

âť” question đź“„ add-to-docs

Most helpful comment

I did it now without git. I have three files: testing.sql -> development.sql -> production.sql
first you split a sql dump into files like in the following picture (I used ! as the smallest ascii and zZ as biggest (I tried $ which is ASCII 245 but mac osx does not sort according ascii code it seems)):
image
Then I used following script to combine all files, call apgdiff on it and push the changes to the db server:

/**
 * Created by velten on 24.02.17.
 */

const glob = require('glob');
const concat = require('concat-files');
const exec = require('child_process').exec;
const path = require('path');
const fs = require('fs');
const pg = require('pg');
const readline = require('readline');

const connectionString = 'postgres://admin:test1234@localhost:5432/velten';
const basePath = path.dirname(__dirname);
process.chdir(basePath); // change to parent directory

let debug = false;
process.argv.forEach(function (val, index, array) {
    if(val === '-d' || val === '--debug') debug = true;
});

glob("src/**/*.sql", function (err, files) {
    if(err) throw err;
    debug && console.log('processing files: ', files);

    concat(files, 'testing.sql', function(err) {
        if (err) throw err;
        debug && console.log('concated files');
        const apgDiffPath = path.join('vendor', 'apgdiff-2.4', 'apgdiff-2.4.jar');

        const parameter = '--ignore-start-with development.sql testing.sql | tee upgrade-dev.sql ';
        const child = exec('java -jar '+apgDiffPath+' '+parameter,
            function (err, stdout, stderr){
                if (err) throw err;
                if(stderr) console.error(stderr);
                debug && console.log('Output -> ' + stdout);

                const rl = readline.createInterface({
                    input: process.stdin,
                    output: process.stdout
                });

                rl.question('Do you want to upgrade?[y,n] ', (answer) => {
                    if(answer === 'y') {
                        const client = new pg.Client(connectionString);
                        client.connect(function (err) {
                            if (err) throw err;

                            client.query(stdout, function (err, result) {
                                if (err) console.error('Upgrade error: '+err);
                                else {
                                    console.log('Postgres result: ', result);
                                    fs.createReadStream('testing.sql').pipe(fs.createWriteStream('development.sql'));
                                }

                                client.end(function (err) {
                                    if (err) throw err;
                                });
                            });
                        });
                    }

                    rl.close();
                });
            });
    });
});

and then this script to build the production.sql diff file:

/**
 * Created by velten on 24.02.17.
 */

const fs = require('fs');
const simpleGit = require('simple-git');

console.log('New directory: ' + process.cwd());
process.chdir('..');

let amend = false;
let message = 'sql changes saved';
process.argv.forEach(function (val, index, array) {
    if(val === '-a' || val === '--amend') amend = true;
    if(val.startsWith('-m=') || val.startsWith('-message=')) message = val.substr(val.indexOf('=')+1);
});

const gitRepo = simpleGit('.');
gitRepo.outputHandler(function (command, stdout, stderr) {
    stdout.pipe(process.stdout);
    stderr.pipe(process.stderr);
});

const apgDiffPath = path.join('vendor', 'apgdiff-2.4', 'apgdiff-2.4.jar');
const upParameter = '--ignore-start-with production.sql development.sql > upgrade.sql';
const downParameter = '--ignore-start-with development.sql production.sql > downgrade.sql';
exec('java -jar '+apgDiffPath+' '+upParameter,
    function (err, stdout, stderr){
        if (err) throw err;
        if(stderr) console.error(stderr);
        console.log('Output -> ' + stdout);

        exec('java -jar '+apgDiffPath+' '+downParameter,
            function (err, stdout, stderr){
                if (err) throw err;
                if(stderr) console.error(stderr);
                console.log('Output -> ' + stdout);

                const copyStream = fs.createReadStream('development.sql').pipe(fs.createWriteStream('production.sql'));
                copyStream.on('finish', function () {
                    gitRepo.add('.').commit(message);
                });
            });
    });

All 33 comments

I agree it's similar, though I don't necessarily want to version the database as that's taken care of via GraphQL to a large extent.

Though, if putting procedures in different schemas is the best way forward I guess that's good.

Alternatively, it would be nice to have a way of loading in procedures from specific folders, allowing us to organize our "code" in SQL files organized in a directory structure.

I personally like Sqitch. It’s a powerful tool that allows you to do version control like things with your SQL migrations. Very opinionated, but that’s what I’m looking for in a SQL migration framework :wink:. It also has some nice “blame” style functionality where the creator of the migration will be logged. Sqitch also lets you do some variant of TDD with your functions through “verify” scripts. If you want to go crazy with TDD, pgTAP is good.

However, Sqitch is a little complicated and somewhat tough to setup/maintain across different users. In the past I’ve tried creating my own migration frameworks (of which there is a Node module and a Rust crate. can’t promise stability though), but I’ve recently come back to Sqitch as a stable feature rich platform for organizing my database migrations.

I think this space is ripe for innovation though if someone wants to make the next big open source project (IMO :wink:). This is a problem shared by virtually every large app developer and I’ve encountered a variant in almost all of my programming (Node.js, Go, Rust, Ruby, Elixir…). It would be nice to have a definitive solution. Sqitch is definetly that for me right now.

I'm not recommending this (because I've not used it) but also worth checking out is:

https://github.com/oelmekki/pgrebase

This is well and good for version control. Though I had hoped for a hierarchy of folders and SQL files.

SQL/users/functions.sql
SQL/users/schema.sql
SQL/auth/functions.sql
Etc.

Would be more maintainable IMHO since we're putting _everything_

Guess there's always room for PRs and new projects ;)

I finally had time to look into the mentioned tools, as well as https://github.com/flyway/flyway.
Most tools available (like flyway) are migrations based urg, migrations are so tedious. With Sqitch you also have to kind of write migrations but easier because of the git support but with the overhead of verify statements ouch. And as @chadfurman vaguely mentioned pgrebase requires that you put every database contruct (type, domain, procedure) into a separate file. Maybe it will be possible to introduce git support into pgrebase and then filter out the parts which changed with a git diff (I will leave an issue on pgrebase).

In the "version control" vein of this conversation, another option might be: http://www.liquibase.org/

However, I'm not sure that solves the issue of not being able to group procedures like one might group "controllers" in an MVC framework. In the long-term, I just see functions being spread out across files or dumped in a "procedures" tab of pgadmin to be something which would be difficult to maintain.

Maybe the answer is that at that scale, we should really be using a Graphene-powered API...

@chadfurman Could you elaborate what you mean by Graphene-powered API? Could nearly find nothing about Graphene.

@valorize Oh, Graphene is just the Python implementation of GraphQL http://graphene-python.org/

I only mean that, at the point where the functions etc are out-of-hand by simply existing within the database, it's probably time to consider building an API using GraphQL libs

I think it is worth adding a link to this issue at the end of the tutorial ( or a doc around this part of the workflow specifically. ) As the first thing I thought of was "well what do I do with this mass of queries now?" and started searching for issues before writing one just like this question.

I've been evaluating this library, and it seems promising. But I hate the idea of not using great tools that are already available for schema design.

What my plan is, is to not write SQL at all for low level things like tables. Instead, i've settled on a simple solution that can scale into the future: use a GUI ( in my case, DataGrip ), to design tables, etc, and then on every schema deploy I pg_dump to get development.sql and production.sql and then run apgdiff --ignore-start-with production.sql development.sql > upgrade.sql. For stored procedures and functions i will just execute them in console windows. They will all be included in the dump.

I can version each dump and then roll back schema if necessary very easily by applying apgdiff in reverse, e.g. apgdiff --ignore-start 1-development.sql 1-production.sql and finally, to set the new schema in production:psql "sslmode=require host=*********** port=5432 dbname=************** user=*************" -f upgrade.sql. What do others think of this idea?

I've not used apgdiff and I don't understand the "in reverse" bit.

How would this work for changes where you need to move or restructure data?
Can you run custom sql if needed or does datagrip handle that nicely?

Versioning whole db dumps will also result in a very large repo.

Finally, this doesn't necessarily solve the trouble that comes from having
hundreds (thousands?) of stored procedures without a nice way of organizing
them.

On Wed, Jan 4, 2017, 2:23 PM Cameron Ellis notifications@github.com wrote:

I've been evaluating this library, and it seems promising. But I hate the
idea of not using great tools that are already available for schema design.

What my plan is, is to not write SQL at all for low level things like
tables. Instead, i've settled on a simple solution that can scale into the
future: use a GUI ( in my case, DataGrip ), to design tables, etc, and then
on every schema deploy I get a sql pg_dump to get development.sql and
production.sql and then run apgdiff --ignore-start-with production.sql
development.sql > upgrade.sql.

I can version each dump and then roll back schema if necessary very easily
by applying apgdiff in reverse, e.g. apgdiff --ignore-start
1-production.sql 1-development.sql and finally, to set the new schema in
production:psql "sslmode=require host=** port=5432
dbname=
*** user=***" -f upgrade.sql. What do others
think of this idea?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/calebmer/postgraphql/issues/235#issuecomment-270461668,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAmxIrGZK3TXU2nBqeblylEmaFHAMBOKks5rO_GYgaJpZM4KyHYt
.

@chadfurman

I've not used apgdiff and I don't understand the "in reverse" bit.

apgdiff simply outputs the necessary changes to get from the first schema to the second. If you have the persons table in development.sql and not in production.sql, and you run apgdiff --ignore-start-with production.sql development.sql > upgrade.sql the upgrade.sql will simply contain:

SET search_path = public, pg_catalog;

DROP TABLE person;

DROP SEQUENCE person_id_seq;

How would this work for changes where you need to move or restructure data?
Can you run custom sql if needed or does datagrip handle that nicely?

I don't think this is the place for sql, but rather application deployment tasks, which should be one off tasks ( you could easily have sql files for just this, and leave them out of your db schema). There's a lot of advice on stackoverflow ( though, unfortunately not yet a consensus ) that it's a bad idea to mutate data as part of a migrations up/down script. Usually a migration involves only schema changes, not data changes. I think data changes should go in one off scripts or tasks (e.g. rake, make, gulp ) and should not be part of a schema definition. This leaves simple changes to the schema, which is pretty easy to reason about.

Versioning whole db dumps will also result in a very large repo.

Yes, if that became a problem I would only version the diffs between dumps, and then write a script to replay the diffs similar to how rails migration scripts work.

Finally, this doesn't necessarily solve the trouble that comes from having
hundreds (thousands?) of stored procedures without a nice way of organizing
them.

My plan for organizing stored procedures is to use naming schemes and schemas to group them. You can see a screenshot of how e.g. the information_schema does this. One can imagine numerous schemas that contain many different functions.
screenshot 2017-01-04 13 54 20

@cellis thanks for mentioning apgdiff. It sounds great. @chadfurman Why do you think "Versioning whole db dumps will also result in a very large repo."? With git it will by minimal and not bigger than your sourcecode would be anyway. Do you want to version you db content as well?

Basically we would put our functions, which belong to each other, into one file. Then we write a script with combines all the diffs into one big diff. Version it and apply it to the database. Or am I missing something? For this to work we need a git repo with at least two branches (production, development or master, next). One branch which we work on and has the current state and one which is on the latest production state (or the state we want to revert to). Then we could use git show master:./path-to-functions/authentication.sql > /tmp/.sql && apgdiff --ignore-start-with /tmp/production.sql ./path-to-functions/authentication.sql >> update.sql

I did it now without git. I have three files: testing.sql -> development.sql -> production.sql
first you split a sql dump into files like in the following picture (I used ! as the smallest ascii and zZ as biggest (I tried $ which is ASCII 245 but mac osx does not sort according ascii code it seems)):
image
Then I used following script to combine all files, call apgdiff on it and push the changes to the db server:

/**
 * Created by velten on 24.02.17.
 */

const glob = require('glob');
const concat = require('concat-files');
const exec = require('child_process').exec;
const path = require('path');
const fs = require('fs');
const pg = require('pg');
const readline = require('readline');

const connectionString = 'postgres://admin:test1234@localhost:5432/velten';
const basePath = path.dirname(__dirname);
process.chdir(basePath); // change to parent directory

let debug = false;
process.argv.forEach(function (val, index, array) {
    if(val === '-d' || val === '--debug') debug = true;
});

glob("src/**/*.sql", function (err, files) {
    if(err) throw err;
    debug && console.log('processing files: ', files);

    concat(files, 'testing.sql', function(err) {
        if (err) throw err;
        debug && console.log('concated files');
        const apgDiffPath = path.join('vendor', 'apgdiff-2.4', 'apgdiff-2.4.jar');

        const parameter = '--ignore-start-with development.sql testing.sql | tee upgrade-dev.sql ';
        const child = exec('java -jar '+apgDiffPath+' '+parameter,
            function (err, stdout, stderr){
                if (err) throw err;
                if(stderr) console.error(stderr);
                debug && console.log('Output -> ' + stdout);

                const rl = readline.createInterface({
                    input: process.stdin,
                    output: process.stdout
                });

                rl.question('Do you want to upgrade?[y,n] ', (answer) => {
                    if(answer === 'y') {
                        const client = new pg.Client(connectionString);
                        client.connect(function (err) {
                            if (err) throw err;

                            client.query(stdout, function (err, result) {
                                if (err) console.error('Upgrade error: '+err);
                                else {
                                    console.log('Postgres result: ', result);
                                    fs.createReadStream('testing.sql').pipe(fs.createWriteStream('development.sql'));
                                }

                                client.end(function (err) {
                                    if (err) throw err;
                                });
                            });
                        });
                    }

                    rl.close();
                });
            });
    });
});

and then this script to build the production.sql diff file:

/**
 * Created by velten on 24.02.17.
 */

const fs = require('fs');
const simpleGit = require('simple-git');

console.log('New directory: ' + process.cwd());
process.chdir('..');

let amend = false;
let message = 'sql changes saved';
process.argv.forEach(function (val, index, array) {
    if(val === '-a' || val === '--amend') amend = true;
    if(val.startsWith('-m=') || val.startsWith('-message=')) message = val.substr(val.indexOf('=')+1);
});

const gitRepo = simpleGit('.');
gitRepo.outputHandler(function (command, stdout, stderr) {
    stdout.pipe(process.stdout);
    stderr.pipe(process.stderr);
});

const apgDiffPath = path.join('vendor', 'apgdiff-2.4', 'apgdiff-2.4.jar');
const upParameter = '--ignore-start-with production.sql development.sql > upgrade.sql';
const downParameter = '--ignore-start-with development.sql production.sql > downgrade.sql';
exec('java -jar '+apgDiffPath+' '+upParameter,
    function (err, stdout, stderr){
        if (err) throw err;
        if(stderr) console.error(stderr);
        console.log('Output -> ' + stdout);

        exec('java -jar '+apgDiffPath+' '+downParameter,
            function (err, stdout, stderr){
                if (err) throw err;
                if(stderr) console.error(stderr);
                console.log('Output -> ' + stdout);

                const copyStream = fs.createReadStream('development.sql').pipe(fs.createWriteStream('production.sql'));
                copyStream.on('finish', function () {
                    gitRepo.add('.').commit(message);
                });
            });
    });

@valorize this is great. I would love it if you packaged it as an open source tool of it's own!

Yeah, absolutely a great idea here :D

Splitting the SQL files in the directory and only combining them when writing to the DB etc is smart. No need to even version the big chunk, right? Just build it from the dev files, unless you want to version a dist folder for whatever reason?

I'm guessing apgdiff is handling all the migratoins in a way that lets us avoid losing data? Pretty cute :)

@chadfurman, commenting here for posterity. So remember how I said I prefer to separate _data deployments_ from _schema deployments_? I stand by that, however I would like to add that this does make it a bit harder to do rollbacks of data deployments. For instance if you change a column from decimal to integer, you may want to write a data migration to update all rows for that column before performing the schema deployment; that way you have full control of what data you lose.

@cellis would you recommend to use following with apgdiff?

--add-transaction:
  adds START TRANSACTION and COMMIT TRANSACTION to the generated diff file

@valorize I'm not sure how transactions work in postgres w/r/t schema changes. So I don't know what the above would do.

Postgres supports transactions for DDL commands; I would definitely wrap migrations in a transaction. If it's possible to run more than one migration in this system then I would wrap them all in a single transaction, personally, rather than one per migration.

As wished I put some more effort into the scripts to support options etc. and put them into a repo (https://github.com/valorize/postgres-schema-updater). The npm package can be found at: https://www.npmjs.com/package/postgres-schema-updater

Maybe we could also create a script which uses https://github.com/zhm/node-pg-query-native and splits a dump automatically into files (having the table and function definitions in one place) and combining it back into one file in appropriate order.

I've settled on using sqitch rework and prefixing all files with table_tableName_rls etc:
function_
role_
schema_
type_
...

table_tableName
table_tableName_rls
table_tableName_rbac
table_tableName_data
...

and then just reworking. If you rework something, put the _full new_ create statement in a comment.

i.e. altering a table to add a new column_id field?

/*
create table tableName {
  id uuid gen_random_uuid() primary
  column_id ...
}
*/
alter table tableName add column column_id ....

and functions are all create or replace

Just make sure your revert scripts do in fact restore the table to the previous state. And make sure you tag every time you deploy the database.

@chadfurman admittedly, i've also been trying sqitch out, because even though the method I suggested upthread ( and @valorize implemented ) is what I would ultimately love, there are some shortcomings which you're by now surely aware with apgdiff that aren't easy to rectify.

That said, I'm curious why you've decided to prefix types to objects. You're further along this path than I so would love to hear any problems you've faced.

Also, while we're on this topic, I would be remiss if I didn't mention that I also looked at Standalone Migrations, but for now will give sqitch a shot and look to see if there's ways I can try to contribute to apgdiff in the meantime, as it seems to support postgraphql-esque schemas with the source, but not the latest release (2.4).

@cellis prefixing objects with types (and not migration numbers) works amazing with sqitch rework. Your deploy directory is much easier to navigate and new migrations just pop right in their spot. Want to see where that column function for user preference is? table_profile_column_userPreferences.sql etc

What's important is that any rework that alters a type or table etc contain, in comments, the full original create statement plus whatever adjustments are necessary to make it equivalent to the new state after the alteration. This makes it super simple to see the state of the db just by looking at files in the deploy folder.

And yes, I picked up apgdiff as a collaborator and I can't keep up with the number of bugs and PRs that are coming through (please feel free to pull down the PRs, compile them, test them, and send feedback...)

(I advise that you factor schema name into the paths too if you have more than one public schema)

I have multiple schemas -- ideally, I'd be able to group these files by schema name in a sub-folder. This is ultra-low on my priority list, however.

It makes sense in retrospect to be like... schemaName_function_functionName .sql

@chadfurman I tried sqitch, and ultimately decided on not using it. It seemed really heavy handed for me right now and overall very difficult to integrate and leverage my current node.js config setup. Instead I've settled on using db-migrate which is a nodejs based project a la standalone-migrations. Both of these are much lighter weight than sqitch and easier to get started with, and you don't have to worry about setting up targets, verify steps, or add -n for every "commit". I also think sqitch tries too much to be like git instead of just executing the sql (like rails migrations) and then getting out of the way.

Oh, did I mention config is easy in db-migrate? It's far easier:

Compare:

// my migrate.js script, using db-migrate's API
// for now, run like `yarn migrate <task> -- --env=<env>`
const argv = require('yargs').argv;
const env = argv.env || 'development';
const config = require(`./config/${env}`);

const migrateConfig = {
  config: {
    [env]: {
      host: config.PG_HOST,
      port: config.PG_PORT,
      user: config.PG_USER,
      password: config.PG_PASSWORD,
      database: config.PG_DATABASE,
      driver: 'pg',
    },
    'sql-file': true,
  },
  env,
};

const DBMigrate = require('db-migrate');
DBMigrate.getInstance(false, migrateConfig).run();

with Sqitch's config and how environments are managed. It's an absolute nightmare to setup.
I'm curious how to simply configure a typical dev,stag,prod env in sqitch. I looked and looked through the sqitch docs and my head was swirling.

That said, I do like that it has a verify step and could see that being useful for mission critical apps. But for what i'm doing, I don't need that or sqitch rebase/sqitch rework.

I also use db-migrate, normally in SQL mode. Sometimes I want to be a bit more interesting, so I write my own up/down code (e.g. to run multiple SQL files in one migration to make rolling back easier) and because I prefer modern JS, I wrote:

https://github.com/benjie/db-migrate-plugin-babel

[semi-automated message] We try and keep the open issues to actual issues (bugs, etc); this seems like more of a discussion right now, so I'm closing it but please feel free to keep discussing it below 👍

Was this page helpful?
0 / 5 - 0 ratings