Vault: Postgres Revocation Problems

Created on 6 Sep 2016 · 60Comments · Source: hashicorp/vault

If you are granting access to sequences and functions you have to revoke those too.

https://github.com/hashicorp/vault/issues/699 only fixes schemas.

It seems to me that allowing for a revocation SQL query might be necessary?

Thoughts?

Source

ekristen

Most helpful comment

@jefferai — Didn't mean to add pressure, just trying to contribute information for whoever is able to take the lead on implementation. We originally developed with Vault given a superuser and then had to backpedal (and relearn some things) because of RDS.

For additional reference, how we're currently set up (Vault 0.6.1, PostgreSQL 9.5), which is working for us:

We create a vault role with LOGIN INHERIT CREATEROLE privileges. This is the role we mount onto Vault.
We create long-lived per-service roles, like foo-service and bar-service. Those roles have NOLOGIN and are only used for ownership and privileges.
Schemas, sequences, functions, and tables are owned by the long-lived service roles.
We do GRANT "foo-service" TO vault WITH ADMIN OPTION for each of the long-lived service roles, so that vault is sure to have permissions to modify privileges on the schemas owned by those services.
Ephemeral roles are created with INHERIT and are granted the the appropriate long-lived roles. We are careful not to grant other privileges to the ephemeral roles, otherwise a non-superuser vault role would need to do SET ROLE first to drop those privileges.
vault is granted the ephemeral roles WITH ADMIN OPTION so that it can revoke etc. that role.
When we use ephemeral roles to create tables, e.g. during migrations, we make sure to either begin the migration with SET ROLE "foo-service" or end with REASSIGN OWNED BY current_user TO "foo-service". Otherwise objects will be owned by the ephemeral role, which prevents it from being dropped.
We also add some locking to ensure we only create roles for a given schema one at a time, because at least in PostgreSQL 9.5 we would run into an error from PostgreSQL, tuples concurrently updated when modifying privileges on a schema concurrently. It's a postgres bug, but has to be worked around.

rwe on 2 Oct 2016

👍5 🚀1

All 60 comments

ISTM?

jefferai on 6 Sep 2016

Apologies for the acronym, updated.

ekristen on 6 Sep 2016

Oh. Huh. Never seen that before.

Anyways, revocation SQL is on our roadmap. Someone enterprising might be able to get it in before us, but it's coming.

jefferai on 6 Sep 2016

I would have never though of it either but we are very careful about what users get what permissions and when I started working on implementing them in vault I started running into errors.

These are really good references for future, I really had no idea the depth to which you could assign or revoke permissions until recently.

Thanks @jefferai

ekristen on 6 Sep 2016

Oh. Huh. Never seen that before.

That was to ISTM :-D

jefferai on 6 Sep 2016

tl;dr: please implement this :+1: I'm currently working on integrating Vault and this is very much needed.

Some background and perhaps pointers at usable work-arounds

There is an idiosyncrasy of PostgreSQL that makes this change especially important to make the PostgreSQL secret backend actually usable for me:

Roles with the CREATEROLE privilege can _grant_ membership in other roles. So theoretically a Vault instance can do everything it needs to with that privilege. However, the CREATEROLE privilege does not allow _revoking_ privileges without first switching to that role.

In other words:

    # --- create an admin role for Vault
    # no create database
    # encrypt password
    # do not inherit rights
    # can create roles
    # not a superuser
    createuser -D -E -I -l -r -S vaultadmin

    # --- create an owner role for your database
    # no create database
    # encrypt password
    # do not inherit rights
    # can't create roles
    # not a superuser
    createuser -D -E -I -L -R -S mydatabaseowner
    createdb -E utf8 -O mydatabaseowner mydatabase

Now you can configure Vault like this:

    $ vault mount -path=mydatabase-auth postgresql
    $ vault write mydatabase-auth/config/connection connection_url=-
    postgresql://vaultadmin:(password)@127.0.0.1:5432/mydatabase
    $ vault write mydatabase-auth/roles/fullaccess sql=-
    CREATE ROLE "{{name}}" WITH LOGIN PASSWORD '{{password}}' VALID
    UNTIL '{{expiration}}' IN ROLE "mydatabaseowner" INHERIT NOCREATEROLE
    NOCREATEDB NOSUPERUSER NOREPLICATION NOBYPASSRLS;

Now, there are two drawbacks to that configuration:

It's important for all credentials created by Vault to log into PostgreSQL and then immediately execute SET ROLE "mydatabaseowner";. This ensures that tables and other objects created are owned by mydatabaseowner and not the Vault-created user (which will not be valid for long). Because PostgreSQL's boneheaded privilege system makes it so that _only_ owners can drop or modify objects created by them. (I created django-postgres-setrole for precisely that use-case.)
Vault can't revoke these credentials as mentioned above without itself executing SET ROLE "mydatabaseowner". Which will only be possible when this ticket is done. Executing SET ROLE will _also_ solve all of the sequence/view permission problems mentioned here.

Right now, the work-arounds are to:

give Vault superuser privileges, which is bad for many reasons or
allow mydatabaseowner to log in and configure Vault to use that role to then in turn create user leases. That is _also bad_, as now Vault has full read/write access to the table contents (but not as bad as having superuser access to the database).

jdelic on 20 Sep 2016

👍4

Extra point of info, Vault's postgres functionality _should not_ be primarily tested with a superuser role. A lot of behaviour is implicitly available as a superuser which Vault shouldn't rely on. Not only to be finicky about principle-of-least-privilege: using Postgres on RDS won't even give you a superuser role, only a not-quite-equivalent rds_superuser group.

rwe on 2 Oct 2016

Nobody is arguing with any of these points, and we're aware of all of the benefits, but it requires someone to step up and do it if you want to see this done in a timeline faster than our own, as is being done with MySQL (https://github.com/hashicorp/vault/pull/1914). If someone wants to code similar support in for Postgres and can do it by the end of Monday I will consider it for 0.6.2. You'd do well to note the comments I made in my review of #1914 though.

jefferai on 2 Oct 2016

For additional reference, how we're currently set up (Vault 0.6.1, PostgreSQL 9.5), which is working for us:

We create a vault role with LOGIN INHERIT CREATEROLE privileges. This is the role we mount onto Vault.
We create long-lived per-service roles, like foo-service and bar-service. Those roles have NOLOGIN and are only used for ownership and privileges.
Schemas, sequences, functions, and tables are owned by the long-lived service roles.
We do GRANT "foo-service" TO vault WITH ADMIN OPTION for each of the long-lived service roles, so that vault is sure to have permissions to modify privileges on the schemas owned by those services.
Ephemeral roles are created with INHERIT and are granted the the appropriate long-lived roles. We are careful not to grant other privileges to the ephemeral roles, otherwise a non-superuser vault role would need to do SET ROLE first to drop those privileges.
vault is granted the ephemeral roles WITH ADMIN OPTION so that it can revoke etc. that role.
When we use ephemeral roles to create tables, e.g. during migrations, we make sure to either begin the migration with SET ROLE "foo-service" or end with REASSIGN OWNED BY current_user TO "foo-service". Otherwise objects will be owned by the ephemeral role, which prevents it from being dropped.
We also add some locking to ensure we only create roles for a given schema one at a time, because at least in PostgreSQL 9.5 we would run into an error from PostgreSQL, tuples concurrently updated when modifying privileges on a schema concurrently. It's a postgres bug, but has to be worked around.

rwe on 2 Oct 2016

👍5 🚀1

@erydo

vault is granted the ephemeral roles WITH ADMIN OPTION so that it can revoke etc. that role.

Just an aside: This is unnecessary/a noop. If vault has CREATEROLE it can grant membership in _all_ non-superuser roles already.

jdelic on 2 Oct 2016

That's good to know, thanks @jdelic!

rwe on 3 Oct 2016

As of today, the revocation logic iterates through all the table schemas and revokes the privileges and usage on each. After that, the privileges on all the tables and sequences, and usage on public schema is revoked.

Reading the statements one by one from, by say a role parameter revoke_sql, and executing them one by one might not be a straight forward solution to this issue.

Or, the statements in revoke_sql can be generic and can contain items represented in {{param_name}} format, which Vault can parse, substitute the input values, and run. But then, few statements can require the need to be executed for each schema. This can quickly get flaky and inflexible.

If Vault, without substituting any input, simply executes what is supplied via revoke_sql, and expect the logic to revoke to be taken care by the input itself, it sounds like a potential solution, but the complexity of input will be high and the feasibility of this is still unclear.

Therefore, we have decided to defer this issue for now, and seek inputs from the interested parties before we attempt a fix.

vishalnayak on 5 Oct 2016

@vishalnayak

As of today, the revocation logic iterates through all the table schemas and revokes the privileges and usage on each. After that, the privileges on all the tables and sequences, and usage on public schema is revoked.

As far as I can tell, that assumption with the way PostgreSQL credentials work will _always be invalid_ unless Vault has superuser permissions or the issued credentials are read-only (which is how the current example in the Vault documentation works). Right now, that code is broken, at least on my setup. Roles never get dropped.

Reading the statements one by one from, by say a role parameter revoke_sql, and executing them one by one might not be a straight forward solution to this issue.

The way I see it, the only thing missing for the current code to work, is that Vault _can't_ use its login credentials specified in postgresql/config/connection to revoke the roles it created _directly_, _unless_ those login credentials either confer superuser privileges, as mentioned above (which is counter-productive for security purposes) or Vault executes a SET ROLE command after connecting to the server first.

As PostgreSQL credentials work (see also my previous comment),

if Vault's login credentials have CREATEROLE and
creates a role newrole in a database owner role/group mydatabaseowner, then to
REVOKE or DROP newrole, Vault _must_ log in with its credentials and execute SET ROLE mydatabaseowner.

As I understand the proposal in this ticket, this would be easy if Vault had a revoke_sql parameter. Let's see how this would work:

$ vault mount postgresql
# user vaultadmin has CREATEROLE, but no superuser privileges
$ vault write postgresql/config/connection connection_url=-
postgresql://vaultadmin:(password)@127.0.0.1:5432/mydatabase
$ vault write postgresql/roles/fullaccess sql=-
CREATE ROLE "{{name}}" WITH LOGIN PASSWORD '{{password}}' VALID
UNTIL '{{expiration}}' IN ROLE "mydatabaseowner" INHERIT NOCREATEROLE
NOCREATEDB NOSUPERUSER NOREPLICATION NOBYPASSRLS;
$ vault write postgresql/roles/fullaccess revoke_sql=-
SET ROLE mydatabaseowner;
DROP ROLE "{{name}}";

A more complete example for some applications would be:

# let's catch the edge-case where a client forgot to run SET ROLE on his Vault-issued credentials
$ vault write postgresql/roles/fullaccess revoke_sql=-
SET ROLE mydatabaseowner;
REASSIGN OWNED BY "{{name}}" TO "mydatabaseowner";
DROP ROLE "{{name}}";

Iterating the above SQL for {{name}} would work as intended.

Caveat: Where I see {{name}} being insufficient is when roles share object access across databases. However, the way Vault uses postgresql/config/connection, cross-database setups can't easily be managed by Vault anyway, so revoke_sql would be a great first step (and also very much complete my currently deployed setup 😉).

jdelic on 5 Oct 2016

@jdelic It doesn't seem that simple in all cases, looking at the current code. There are sequences, schemas, functions, etc. Users can be granted permissions on different schemas. I'd also be concerned, not knowing enough about setting a connection's role, whether there are times when that would fail based on the initial permissions.

Maybe to put it another way, in your second example you say "a more complete example for some applications...". I'm concerned about whether a blank canvas for revoke_sql is enough for all applications.

jefferai on 5 Oct 2016

It seems to me that if the user knows the SQL statement to create the role, they should also be able to craft the SQL statement to revoke the role.

I've found that in my setup (using AWS RDS) that none of my roles get revoked. My database schemas are fairly complex. Named schemas, not using the public schema at all, sequences, functions, triggers, tables, views etc are all being used. While vault has no problem creating a role, it cannot revoke it. :(

ekristen on 5 Oct 2016

@ekristen Generally speaking I agree with you, and that was my initial thought. What I wasn't sure about -- not knowing Postgres very well -- was why the existing logic works the way it does and whether simply being able to execute a set of statements would be enough.

jefferai on 5 Oct 2016

@ekristen @jdelic How fast would you be able to test a build from master?

jefferai on 5 Oct 2016

@jefferai I'm with you there. I'm relatively new to postgres permissions and its a PITA.

For example if you grant usage and all privileges on all tables in a schema to a user and then a new table is added the user doesn't get access to that new table unless you specifically alter the default permissions for that user against that schema.

I know that I have crafted a sql statement to create and revoke a role, this might be out of ignorance of the postgres permission system, there might be some really simple thing I'm missing.

I think revoke_sql would follow the 80/20 rule at least, and allow for users to attempt to get their revocations working properly.

ekristen on 5 Oct 2016

@jefferai lol, after I sent that last comment, your question popped up ... I could probably test it today.

ekristen on 5 Oct 2016

@ekristen I'm basically shoving in a last minute capability to do this in "beta" -- for now, it won't be on the web site docs, and won't be announced as a feature. But I do want to help you out if I can. Essentially, I'll be running our current acceptance tests to ensure I didn't break anything but probably will not be able to write new ones before release. If I can poke you and you can compile and test the code quickly, you can at least test it somewhere real before the tagging :-)

jefferai on 5 Oct 2016

I can compile, spin it up locally and throw my production configs into it and give it a test today.

ekristen on 5 Oct 2016

@ekristen see the branch in https://github.com/hashicorp/vault/pull/1972

jefferai on 5 Oct 2016

@jdelic This is in master now (and will be in 0.6.2) in a semi-hidden role parameter called revocation_sql. It works for @ekristen , let me know how it goes for you. We can make it publicly documented on the web site at any time if it's working for everyone.

jefferai on 5 Oct 2016

🎉1

@jefferai I can't get the new code to work, unfortunately. I compiled post-merge master (b2d2bb55) with golang 1.7.1 from Debian Stretch and tried to set it up as we discussed in this ticket. However, even on a fresh vault, I can't set revocation_sql. Every time I try, it seems that the sql parameter gets deleted.

[vagrant@saltmaster ~]$ vault mount -path=db-authserver postgresql
Successfully mounted 'postgresql' at 'db-authserver'!

[vagrant@saltmaster ~]$ vault write db-authserver/config/connection connection_url=-
postgresql://vaultadmin:(redacted)@postgresql.local/authserver

The following warnings were returned from the Vault server:
* Read access to this endpoint should be controlled via ACLs as it will return the connection string or URL as it is, including passwords, if any.

[vagrant@saltmaster ~]$ vault write db-authserver/roles/fullaccess sql=-
CREATE ROLE "{{name}}" WITH LOGIN ENCRYPTED PASSWORD '{{password}}' VALID
    UNTIL '{{expiration}}' IN ROLE "authserver" INHERIT NOCREATEROLE
    NOCREATEDB NOSUPERUSER NOREPLICATION NOBYPASSRLS;
Success! Data written to: db-authserver/roles/fullaccess

[vagrant@saltmaster ~]$ vault read db-authserver/roles/fullaccess
Key     Value
---     -----
sql     CREATE ROLE "{{name}}" WITH LOGIN ENCRYPTED PASSWORD '{{password}}' VALID
    UNTIL '{{expiration}}' IN ROLE "authserver" INHERIT NOCREATEROLE
    NOCREATEDB NOSUPERUSER NOREPLICATION NOBYPASSRLS;

[vagrant@saltmaster ~]$ vault write db-authserver/roles/fullaccess revocation_sql=-
SET ROLE authserver;
DROP ROLE IF EXISTS "{{name}}";
Success! Data written to: db-authserver/roles/fullaccess

[vagrant@saltmaster ~]$ vault read db-authserver/roles/fullaccess
Key     Value
---     -----
sql

[vagrant@saltmaster ~]$ vault status
Sealed: false
Key Shares: 5
Key Threshold: 3
Unseal Progress: 0
Version: 0.6.2
Cluster Name: vault-cluster-c7830818
Cluster ID: 4214a1fa-c734-8aff-0a3a-7bd059bfb4d1

High-Availability Enabled: false

Any ideas?

jdelic on 5 Oct 2016

You have to set them both at the same time.

ekristen on 6 Oct 2016

@ekristen is correct. Try adding the revocation_sql option when you initially write to the role with the sql option.

LLBennett on 6 Oct 2016

@jdelic Vault endpoints used to follow the "pull all, update fields, push all" semantics. This is changing incrementally. In the documentation, you will notice some endpoints stating that it supports both create and update capabilities. In such cases, it would work as you expected. It is just that this endpoint does not support it, yet!

vishalnayak on 6 Oct 2016

@vishalnayak @jefferai

I have now tried updating my configuration to set both at the same time:

[vagrant@saltmaster ~]$ vault write db-authserver/roles/fullaccess sql="CREATE ROLE \"{{name}}\" 
WITH LOGIN ENCRYPTED PASSWORD '{{password}}' VALID UNTIL '{{expiration}}' IN ROLE 
\"authserver\" INHERIT NOCREATEROLE NOCREATEDB NOSUPERUSER NOREPLICATION NOBYPASSRLS;" 
revocation_sql="SET ROLE authserver; DROP ROLE IF EXISTS \"{{name}}\";"

Success! Data written to: db-authserver/roles/fullaccess

However, the roles still don't get deleted. My PostgreSQL logs show this, though:

2016-10-05 22:37:18 UTC [6441-2] vaultadmin@authserver STATEMENT:  REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA public FROM "root-1e6e6d66-1ee9-2c28-9c51-c29b2b023bab";
2016-10-05 22:37:18 UTC [6441-3] vaultadmin@authserver ERROR:  permission denied for sequence django_migrations_id_seq

That's not my revocation_sql value getting executed there. Any other obvious mistakes? :(

jdelic on 6 Oct 2016

@jdelic so pq.QuoteIdentifier is being added to the revocation_sql where it is not being added to the sql statement, so you need to remove the quote escape around the name, because it is done automatically on the revocation_sql.

{
  "sql": "CREATE ROLE \"{{name}}\" WITH LOGIN ENCRYPTED PASSWORD '{{password}}' VALID UNTIL '{{expiration}}' IN ROLE \"authserver\" INHERIT NOCREATEROLE NOCREATEDB NOSUPERUSER NOREPLICATION NOBYPASSRLS;",
  "revocation_sql": "SET ROLE authserver; DROP ROLE IF EXISTS {{name}};"
}

vault write db-authserver/roles/fullaccess @path/to/json/file/given/above.json

ekristen on 6 Oct 2016

@ekristen ah thanks... that's a little inconsistent with sql and I would never have found that 👍

However, it still doesn't execute my SQL:

[vagrant@saltmaster ~]$ cat test.json
{
  "sql": "CREATE ROLE \"{{name}}\" WITH LOGIN ENCRYPTED PASSWORD '{{password}}' VALID UNTIL '{{expiration}}' IN ROLE \"authserver\" INHERIT NOCREATEROLE NOCREATEDB NOSUPERUSER NOREPLICATION NOBYPASSRLS;",
  "revocation_sql": "SET ROLE authserver; DROP ROLE IF EXISTS {{name}};"
}

[vagrant@saltmaster ~]$ vault write db-authserver/roles/fullaccess @test.json
Success! Data written to: db-authserver/roles/fullaccess

[vagrant@saltmaster ~]$ vault read db-authserver/creds/fullaccess
Key             Value
---             -----
lease_id        db-authserver/creds/fullaccess/95f0d271-945a-0b61-9f77-22c90a0f2f21
lease_duration  1m0s
lease_renewable true
password        94dc9433-008c-96e6-47ce-ff59de6fc54d
username        root-cda76fc5-f428-05ae-9c56-401794087014

and in my PostgreSQL log after one minute, I find:

2016-10-05 23:02:10 UTC [29172-16] vaultadmin@authserver STATEMENT:  REVOKE ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public FROM "root-cda76fc5-f428-05ae-9c56-401794087014";
2016-10-05 23:02:10 UTC [29172-17] vaultadmin@authserver WARNING:  no privileges could be revoked for "public"

So the quotes weren't the final problem, it seems.

jdelic on 6 Oct 2016

@jdelic What is the output of vault read db-authserver/roles/fullaccess?

vishalnayak on 6 Oct 2016

@jdelic That's not correct re: quotes; the pg.QuoteIdentifier was removed before release. You _must_ have quotes around {{name}} (at least, if the SQL statements require the username to be quoted).

jefferai on 6 Oct 2016

@jefferai my bad! could it be he didn't compile the right version? I was accidentally using revoke_sql originally and obviously wasn't working, once I switched it over to revocation_sql it was all good.

ekristen on 6 Oct 2016

I'm not sure. But 0.6.2 released, and master currently, do not use pg.QuoteIdentifier and thus require quotes around the identifier :-)

jefferai on 6 Oct 2016

@vishalnayak

[vagrant@saltmaster ~]$ vault read -format=json db-authserver/roles/fullaccess
{
        "request_id": "554aaba6-65dc-c13b-4342-862d1b434659",
        "lease_id": "",
        "lease_duration": 0,
        "renewable": false,
        "data": {
                "sql": "CREATE ROLE \"{{name}}\" WITH LOGIN ENCRYPTED PASSWORD '{{password}}' VALID UNTIL '{{expiration}}' IN ROLE \"authserver\" INHERIT NOCREATEROLE NOCREATEDB NOSUPERUSER NOREPLICATION NOBYPASSRLS;"
        },
        "warnings": null
}

@jefferai @ekristen

[vagrant@saltmaster (master) vault]$ git rev-parse HEAD
b2d2bb55d25f8b6ab70f4044c4ddb6bf1050eab6

compiled via:

[vagrant@saltmaster ~]$ git clone https://github.com/hashicorp/vault
[vagrant@saltmaster ~]$ GOPATH=/home/vagrant/go go get github.com/mitchellh/gox
[vagrant@saltmaster ~]$ GOPATH=/home/vagrant/go go get github.com/hashicorp/vault/cli
[vagrant@saltmaster ~]$ cd vault
[vagrant@saltmaster (master) vault]$ 
GOPATH=/home/vagrant/vault:/home/vagrant/go PATH=~/go/bin:$PATH make dev

[vagrant@saltmaster (master) vault]$ vault status
Sealed: false
Key Shares: 5
Key Threshold: 3
Unseal Progress: 0
Version: 0.6.2
Cluster Name: vault-cluster-c7830818
Cluster ID: 4214a1fa-c734-8aff-0a3a-7bd059bfb4d1

High-Availability Enabled: false
[vagrant@saltmaster (master) vault]$ vault version
Vault v0.6.2

jdelic on 6 Oct 2016

@jdelic The error you pasted had old default revocation SQL statement in it. That was the reason I suspected that the role might not have revocation_sql registered properly. Output of read on the role confirms that.

vishalnayak on 6 Oct 2016

@vishalnayak

The error you pasted had old default revocation SQL statement in it. That was the reason I suspected that the role might not have revocation_sql registered properly. Output of read on the role confirms that.

This test instance uses PostgreSQL as the back-end storage. So to make sure that I had a fresh start, I dropped and recreated the database then reinitialized the Vault using vault init and set the database connection up... so I don't know exactly what the reason for that could be? There is currently only one Vault binary on that machine and it's the one I compiled myself :-/.

Any idea how this can happen?

jdelic on 6 Oct 2016

@jdelic the only thing I can think of is that the wrong binary is being used.

It's in the 0.6.2 release want to try and download that and try it instead?

ekristen on 6 Oct 2016

I can't replicate; with a new dev server built against master:

$ vault write postgresql/config/connection connection_url="postgresql://postgres:[email protected]:32769/postgres?sslmode=disable"


The following warnings were returned from the Vault server:
* Read access to this endpoint should be controlled via ACLs as it will return the connection string or URL as it is, including passwords, if any.

$ vault write postgresql/roles/fullaccess @test.json && vault read postgresql/roles/fullaccess
Success! Data written to: postgresql/roles/fullaccess
Key             Value
---             -----
revocation_sql  SET ROLE authserver; DROP ROLE IF EXISTS "{{name}}";
sql             CREATE ROLE "{{name}}" WITH LOGIN ENCRYPTED PASSWORD '{{password}}' VALID UNTIL '{{expiration}}' IN ROLE "authserver" INHERIT NOCREATEROLE NOCREATEDB NOSUPERUSER NOREPLICATION NOBYPASSRLS;

Contents of test.json:

{
          "sql": "CREATE ROLE \"{{name}}\" WITH LOGIN ENCRYPTED PASSWORD '{{password}}' VALID UNTIL '{{expiration}}' IN ROLE \"authserver\" INHERIT NOCREATEROLE NOCREATEDB NOSUPERUSER NOREPLICATION NOBYPASSRLS;",
          "revocation_sql": "SET ROLE authserver; DROP ROLE IF EXISTS \"{{name}}\";"
}

jefferai on 6 Oct 2016

@ekristen Since I compiled this myself on a VM, I doubt it. But sure... I'll give it a try.

@jefferai
So, I have downloaded the official amd64 0.6.2 binary:

[vagrant@saltmaster ~]$ md5sum /usr/local/bin/vault
3b2b2e1d05d9a1ca57ef4b7195080549  /usr/local/bin/vault

To make sure that I don't have any old Vault things, I have switched to a -dev server on a local screen:

[root@saltmaster sources.list.d]$ vault server -dev -dev-listen-address=192.168.56.88:8200 -log-level=debug
==> Vault server configuration:

                 Backend: inmem
              Listener 1: tcp (addr: "192.168.56.88:8200", cluster address: "",
tls: "disabled")
               Log Level: debug
                   Mlock: supported: true, enabled: false
                 Version: Vault v0.6.2

==> WARNING: Dev mode is enabled!

In this mode, Vault is completely in-memory and unsealed.
Vault is configured to only have a single unseal key. The root
token has already been authenticated with the CLI, so you can
immediately begin using the Vault CLI.

The only step you need to take is to set the following
environment variables:

    export VAULT_ADDR='http://192.168.56.88:8200'

The unseal key and root token are reproduced below in case you
want to seal/unseal the Vault or play with authentication.

Unseal Key: fOfPpZhNPWNmvpcAOL/XFLWP91NmhrUpOTzJm8Bw0CU=
Root Token: 00565fc8-c050-37b2-378e-8651752d1dc0

==> Vault server started! Log data will stream in below:
...

I have mounted the backend like you:

[vagrant@saltmaster ~]$ vault auth
Token (will be hidden):
Successfully authenticated! You are now logged in.
token: 00565fc8-c050-37b2-378e-8651752d1dc0
token_duration: 0
token_policies: [root]
[vagrant@saltmaster ~]$ export VAULT_ADDR='http://192.168.56.88:8200'
[vagrant@saltmaster ~]$ vault mount postgresql
Successfully mounted 'postgresql' at 'postgresql'!

[vagrant@saltmaster ~]$ vault write postgresql/config/connection connection_url="postgresql://vaultadmin:(redacted)@postgresql.local/authserver"
The following warnings were returned from the Vault server:
* Read access to this endpoint should be controlled via ACLs as it will return the connection string or URL as it is, including passwords, if any.

But I get an entirely different result :(

[vagrant@saltmaster ~]$ cat test.json
{
  "sql": "CREATE ROLE \"{{name}}\" WITH LOGIN ENCRYPTED PASSWORD '{{password}}' VALID UNTIL '{{expiration}}' IN ROLE \"authserver\" INHERIT NOCREATEROLE NOCREATEDB NOSUPERUSER NOREPLICATION NOBYPASSRLS;",
  "revocation_sql": "SET ROLE authserver; DROP ROLE {{name}};"
}

[vagrant@saltmaster ~]$ vault write postgresql/roles/fullaccess @test.json
Success! Data written to: postgresql/roles/fullaccess

[vagrant@saltmaster ~]$ vault read postgresql/roles/fullaccess
Key     Value
---     -----
sql     CREATE ROLE "{{name}}" WITH LOGIN ENCRYPTED PASSWORD '{{password}}' VALID UNTIL '{{expiration}}' IN ROLE "authserver" INHERIT NOCREATEROLE NOCREATEDB NOSUPERUSER NOREPLICATION NOBYPASSRLS;

jdelic on 6 Oct 2016

@jdelic I was wrong about the quoting when I originally tested the quotes needed to be removed but Jeff fixed it so that it works like the sql statement.

I'm not sure if read is going to list the revocation-sql key out.

ekristen on 6 Oct 2016

@ekristen

I was wrong about the quoting when I originally tested the quotes needed to be removed but Jeff fixed it so that it works like the sql statement.

Yeah, good catch though. I guess that was commit d7615b0 that fixed that. But I built my copy from a later commit and the quoting made no difference in my tests :(, even with the official 0.6.2 binary.

I'm not sure if read is going to list the revocation-sql key out.

It does in @jefferai's comment, which if built from currently published master is built from the same commit b2d2bb55 that I built from and @vishalnayak also said as much above.

Right now, since I tested this against master and the official binaries with a dev server and a postgresql backend, my leading theory is that there are unpushed commits somewhere 😕

jdelic on 6 Oct 2016

@jdelic I built from source, are you sure there isn't another vault binary in your path?

To be sure you could modify https://github.com/hashicorp/vault/blob/master/version/version_base.go#L7 in your local source that you build and change that value so that when you query for version you know you are using your built version?

ekristen on 6 Oct 2016

I _think_ what's happening is a CDN cache problem. Trying to isolate.

jefferai on 6 Oct 2016

@ekristen

To be sure you could modify https://github.com/hashicorp/vault/blob/master/version/version_base.go#L7

that's a good idea, thanks for the pointer. I am however 99.9% sure that I consistently used the correct binary, since I compared MD5s on them to rule that precise problem out. If @jefferai doesn't make any progress, I'll redo all my tests with a fresh compile with modified version number. Can you reproduce the above output? I think at this point, if a third person can replicate either results, that would be good data to have :)

jdelic on 6 Oct 2016

I honestly don't know how/what happened but when I built the new binaries they appear to have failed actually uploading to the CDN and I didn't see. Ugh.

The binaries corresponding to the _actual_ 0.6.2 tag should now be up...please try them. I'm not going to send out an announcement since the only difference is the postgres feature, which is unannounced and in beta :-)

jefferai on 6 Oct 2016

@jefferai

So I had to go to bed last night :), but I now continued testing.

The good news: With the new binaries I can now replicate the behavior you've shown above and get the correct output.

[vagrant@saltmaster ~]$ vault read postgresql/roles/fullaccess
Key             Value
---             -----
revocation_sql  SET ROLE authserver; DROP ROLE "{{name}}";
sql             CREATE ROLE "{{name}}" WITH LOGIN ENCRYPTED PASSWORD '{{password}}' VALID UNTIL '{{expiration}}' IN ROLE "authserver" INHERIT NOCREATEROLE NOCREATEDB NOSUPERUSER NOREPLICATION NOBYPASSRLS;

The bad news: It still doesn't work correctly. My Vault debug logs now show

2016/10/06 11:00:58.331409 [ERROR] expire: failed to revoke lease: lease_id=postgresql/creds/fullaccess/796e9479-6ab6-53f2-e535-8005c66f7b61 error=failed to revoke entry: resp:(*logical.Response)(nil) err:pq: permission denied to drop role

PostgreSQL logs:

2016-10-06 10:58:18 UTC [31730-1] vaultadmin@authserver ERROR:  permission denied to drop role
2016-10-06 10:58:18 UTC [31730-2] vaultadmin@authserver STATEMENT:  DROP ROLE "root-69c3ea26-1f97-5836-8e27-e9b5c195fb7a"

Question: Does Vault execute the revocation_sql statements using the same (i.e. one) database connection? Elevating privileges using SET ROLE will obviously not work otherwise...

jdelic on 6 Oct 2016

@jdelic I've verified this to work. I would hazard to guess you've got a SQL problem or a permissions problem with your connection string user.

What happens if you run your revocation_sql manually after using vault to create the creds?

SET ROLE authserver; DROP ROLE "root-69c3ea26-1f97-5836-8e27-e9b5c195fb7a";

ekristen on 6 Oct 2016

Question: Does Vault execute the revocation_sql statements using the same (i.e. one) database connection? Elevating privileges using SET ROLE will obviously not work otherwise...

Yes.

jefferai on 6 Oct 2016

Actually, let me follow up: Vault uses the same connection URL. But there is a connection pool under the hood, so Vault may not be using the actual literal same connection.

jefferai on 6 Oct 2016

Okay... multiple things to address in this comment :)

@ekristen

What happens if you run your revocation_sql manually after using vault to create the creds?

I spent the last 30 minutes debugging this and you were right, the problem ultimately was on my end. Reprovisioning the vagrant box and resetting the database fixed the permissions. I probably broke something during testing last night and changed the permissions around. SET ROLE authserver is ultimately unnecessary if the role vaultadmin is a member of authserver, because then only vaultadmin needs the CREATEROLE privilege. So my revocation_sql now is simply DROP ROLE "{{name}}";.

@jefferai
I can now confirm that the official binaries of 0.6.2(new) work as described with revocation_sql, not using SET ROLE xxx, but consistently expiring leases and removing roles as required.

Actually, let me follow up: Vault uses the same connection URL. But there is a connection pool under the hood, so Vault may not be using the actual literal same connection.

If that specifically means that Vault might execute SET ROLE authserver; on connection _A_ and then execute DROP ROLE "{{name}}"; on connection _B_ that behavior should be changed. It will lead to all kinds of trouble. SET ROLE is effectively sudo and affects the connection from that point forward. But that should probably go into its own ticket if you confirm that that is how Vault behaves.

jdelic on 6 Oct 2016

If that specifically means that Vault might execute SET ROLE authserver; on connection A and then execute DROP ROLE "{{name}}"; on connection B that behavior should be changed.

I actually don't have much insight as to the underlying semantics (because I haven't looked into lib/pq that deeply) but the Go SQL interface leaves the isolation factor up to the driver. My working assumption is that, because the statements are executed in a transaction, the same connection will be used to execute the entire transaction.

jefferai on 6 Oct 2016

@jefferai
to tie up loose ends, I provisioned a completely clean VM, installed golang 1.7.1 and compiled Vault master. I report that this binary works as described now. I have no idea what went wrong last night, but it's all good now.

I guess you can close this behemoth of a ticket. Thank you for being so incredibly responsive (you too @ekristen!). I will push this into production now! 🍰

jdelic on 6 Oct 2016

🎉2

@jefferai

If that specifically means that Vault might execute SET ROLE authserver; on connection A and then execute DROP ROLE "{{name}}"; on connection B that behavior should be changed.

I actually don't have much insight as to the underlying semantics (because I haven't looked into lib/pq that deeply) but the Go SQL interface leaves the isolation factor up to the driver. My working assumption is that, because the statements are executed in a transaction, the same connection will be used to execute the entire transaction.

Cool. If that's not the case, I will find out when this goes into staging :).

jdelic on 6 Oct 2016

As long as the whole SQL (sql or revocation) string is being executed as a single query it should use the same connection.

ekristen on 6 Oct 2016

@ekristen It's not -- it's executed statement by statement (splitting on semicolons), but within a single transaction. I suppose that it _could_ be executed as a single statement; it's just not how things are currently done.

I took a look; internally to the pq library, the Begin function to start a transaction is a member function of a particular conn so I think this should be fine!

Closing as requested. If someone is interested in running full sql/revocation_sql statements as a single query feel free to open up a PR :-)

jefferai on 6 Oct 2016

I agree, as long as it is the same transaction it should be good to go.

ekristen on 6 Oct 2016

We also add some locking to ensure we only create roles for a given schema one at a time, because at least in PostgreSQL 9.5 we would run into an error from PostgreSQL, tuples concurrently updated when modifying privileges on a schema concurrently. It's a postgres bug, but has to be worked around.

@erydo Do you have any details on what/how you did to workaround this? I'm running into this bug a lot in our instance do to system architecture having multiple processes requesting logins at the same time.

thesoftwarejedi on 4 Apr 2019

@thesoftwarejedi Yes, we actually did a couple things.

We separated separated ephemeral users from "static" base roles.
- For each Vault role, we'd maintain a single persistent NOLOGIN Postgres role. (e.g. foo-service). This Postgres role received the GRANTs to schema/DB/table/function and would own objects like tables. This allowed us to avoid re-issuing GRANTs when creating ephemeral users. We'd update that base role's permissions as part of deploying new Vault policies.
- Ephemeral users (e.g. foo-service-1871937491) were created with INHERIT, and were granted to that parent role. (e.g.GRANT foo-service TO foo-service-1871937491).
When creating these base roles and the ephemeral roles, or updating the GRANTs, we'd wrap those in a role-specific pg_advisory_xact_lock, whose ID was based on the hashed role name.

The above was accomplished by factoring those into PLPGSQL procedures that could be called directly from Vault's provisioning statement, which didn't allow PLPSQL itself.

This meant that the actual Vault provisioning statement was something like:

SELECT provision_ephemeral_role(
  'foo-service', '{{name}}', '{{password}}', '{{expiration}}');

The provision_ephemeral_role would do, roughly:

CREATE FUNCTION provision_ephemeral_role(base_role_name VARCHAR, name VARCHAR, password VARCHAR, expiration VARCHAR)
  RETURNS VOID
AS $$
  BEGIN
    PERFORM pg_advisory_xact_lock('x' || lpad(encode(digest(base_role_name, 'sha256'), 'hex'), 16, '0'))::BIT(64)::BIGINT);
    EXECUTE FORMAT('CREATE ROLE %I WITH INHERIT LOGIN PASSWORD %L VALID UNTIL %L', name, password, expiration);
    EXECUTE FORMAT('GRANT %I TO vault WITH ADMIN OPTION', name);
    EXECUTE FORMAT('GRANT %I TO %I', base_role_name, name);
  END
$$
LANGUAGE PLPGSQL;

rwe on 4 Apr 2019

👍2

In addition to https://github.com/hashicorp/vault/issues/1857#issuecomment-250992526:

Regarding the use of large objects in this scenario, I got "permission denied for large object".
The reason and solutions are explained here: https://dba.stackexchange.com/questions/147607/postgres-large-objects-multiple-users

For me, the following solution worked well so far:

ALTER ROLE \"{{name}}\" SET role = $persistent_role_name;

By having this in the role config for vault, it is made sure that the correct owner (the persistent Postgres role) is not only set for migration scripts, but also when creating LOBs.