Pg-promise: support for nested named parameters

Created on 26 Sep 2017 · 32Comments · Source: vitaly-t/pg-promise

Hey @vitaly-t, thanks for the great library!

While using named parameters like:

query('INSERT into users (name) VALUES ($[name])', { name: 'Fred' })

There are more complex cases I run across where I'd really love to be able to reference nested values, like many similar syntax in other libraries allow for, like:

'$[user.firstName]'

Or even:

'$[snapshots.0.data]'

I think this would feel very intuitive, and make dealing with some complex cases a lot easier. Right now I'm having to write "flattening" helpers to achieve the same affect while keeping things in the variables object namespaced from each other.

If we used lodash/get for accessing, I think this might be really easy to add support for, and be done in a fairly standardized way.

Thanks for reading!

improvement

Source

ianstormtaylor

👍1

Most helpful comment

Released the feature as v6.10.0

vitaly-t on 27 Sep 2017

👍3

All 32 comments

It was discussed here: using named parameters with nested objects.

vitaly-t on 26 Sep 2017

Thanks for the quick reply!

I'm not sure I understand why it's not possible, since it seems like instead of doing:

if (v.name in obj) {
  return formatValue(obj[v.name], v.fm, obj);
}

You could do:

if (has(obj, v.name)) {
  return formatValue(get(obj, v.name), v.fm, obj);
}

Am I missing something?

ianstormtaylor on 26 Sep 2017

Yes, you missing the part where it says:

The library only formats Named Parameters, it doesn't evaluate them.

Your approach requires evaluation, which in itself brings huge performance hit.

And another reason - variables here support rich syntax for value transformations, like ${varName:json}, etc, so the full variable syntax is impossible, or would be problematic.

vitaly-t on 26 Sep 2017

What do you mean by requires evaluation? Are you talking about the need to recurse into the object to read the nested paths? Or something for the Postgres pre-evaluated queries?

As far as I can tell, obj[v.name] vs get(obj, v.name) is a similar amount of "evaluation" if you consider that nested parameters are just constructs, and not the values themselves.

As for performance, do you mean the performance of obj[v.name] vs. get(obj, v.name)? That seems like not a huge problem to me because people that are needing to use nested values are already having to resort to hacks around that (like flattening) that probably incur similar performance loss. Having the library handle it natively would greatly reduce the code complexity around some of these queries.

For the value transforms, you could still do the same thing user.name:raw or whatever you wanted, and I think the code as architected would already support this.

I guess I'm missing something deeper...

ianstormtaylor on 26 Sep 2017

As for performance, do you mean the performance of obj[v.name] vs. get(obj, v.name)? That seems like not a huge problem to me because people that are needing to use nested values are already having to resort to hacks around that (like flattening) that probably incur similar performance loss.

Not quite. For once, it would be replacing direct JavaScript with two third-party calls - one for has, one for get, and those would have to be called for everything, thus dropping the performance a lot for the sake of 1% of code out there.

Most developers do not need nested access. In practice, and especially when implementing large database services, it is best to have all your SQL in external files, which is when nobody really needs nested variable access anymore.

vitaly-t on 26 Sep 2017

Anyhow, your use case is too unique to be making such a change in the library at this point. And it would affect much more in this library than you think, like the entire helpers namespace for one thing.

I will take it under a consideration, but I don't know if I will ever get around to it, or even if it is worth it.

vitaly-t on 26 Sep 2017

Fair enough, in that case I'll close this.

For what it's worth, I use pg-promise for everything I do with Postgres, and I really like it. But the helpers/formatting part of it I have to look up in the docs _every time_ I use them, because they're really unintuitive. Trying to remember ^ or ~ or how this is special-cased, and trying to remember the syntax for insert or ColumnSet or what the import name is. And some of the stuff is in the "official" documentation, and some is in the readmes, and some is in the wiki. It's the thing that ends up frustrating me the most when using this lib.

ianstormtaylor on 26 Sep 2017

^ or ~ were long time ego made interchangeable with :raw and :name, which surely is simple to remember ;)

this case is fore very unique cases, one usually doesn't need it.

Anyhow, it is all very well documented within the helpers :wink:

And there are many examples on StackOverflow also.

vitaly-t on 26 Sep 2017

@ianstormtaylor the question did intrigue me, so I had a look at how methods has and get work in lodash, and start to believe there is a way to implement those without affecting the performance.

vitaly-t on 27 Sep 2017

I did a research for performance of split versus indexOf, finding that the latter is significantly faster. So I came up with this implementation for the two methods:

// doesn't assume validity of the property path
function has(obj, prop) {
    if (prop.indexOf('.') === -1) {
        return prop in obj;
    }
    const names = prop.split('.');
    for (let i = 0; i < names.length; i++) {
        const n = names[i];
        if (obj && typeof obj === 'object' && n in obj) {
            obj = obj[n];
        } else {
            return false;
        }
    }
    return true;
}

// assumes validity of the property path
function get(obj, prop) {
    if (prop.indexOf('.') === -1) {
        return obj[prop];
    }
    const names = prop.split('.');
    for (let i = 0; i < names.length; i++) {
        obj = obj[names[i]];
    }
    return obj;
}

Since get would only be called after a successful has, it can assume validity of the property path, and never verify it, to improve performance.

The slow-down with this approach would only occur when you actually do use nested names, because when you don't, it only executes the additional indexOf, and that's it.

I believe it is possible to optimize this code even further, by making method get even more dependent on method has, so it wouldn't need to do a second indexOf / split at all, and reuse their results from has.

I'm leaving it for now, in case I do decide to add it all later on.

vitaly-t on 27 Sep 2017

Here, I have come up with something that would offer the maximum performance:

function getIfHas(obj, prop) {
    const result = {};
    if (prop.indexOf('.') === -1) {
        result.has = prop in obj;
        result.value = obj[prop];
    } else {
        const names = prop.split('.');
        for (let i = 0; i < names.length; i++) {
            const n = names[i];
            if (obj && typeof obj === 'object' && n in obj) {
                obj = obj[n];
            } else {
                result.has = false;
                return result;
            }
        }
        result.has = true;
        result.value = obj;
    }
    return result;
}

The method will always return an object like {has: true, value: 123}, which should be the fastest, because:

it executes indexOf only once, and split when needed, also only once
it gets the value at the same time, so the get verifications are no longer needed

vitaly-t on 27 Sep 2017

I have done a comprehensive performance test of getIfHas versus calling _.has and _.get from lodash, and I am getting a consistent 300% performance advantage with getIfHas when using nested property names, and 500% advantage when using regular (non-nested) property names.

Tests were done on Windows 10, under Node.js 8.5.0

vitaly-t on 27 Sep 2017

I just did a post here for lodash: https://github.com/lodash/lodash/issues/3401, to share some thoughts on my findings.

The author (@jdalton) immediately marked it invalid, with comment Oh boy, closed it and locked access, so I can't do anything there anymore.

This is by far the most rude/obnoxious response I have seen from any public library support.

Indeed,... Oh boy... 😠

vitaly-t on 27 Sep 2017

@vitaly-t Naw, you gotta give me time to reply before jumping to conclusions 😸

jdalton on 27 Sep 2017

@jdalton well, I work on GitHub projects full time, and fast. I saw your invalid label attached, no comment, so I politely asked why, and you just locked the issue, so I can no longer follow up.

vitaly-t on 27 Sep 2017

well, I work on GitHub projects full time, and fast.

I take it a bit more easy. In this case you caught me as I was trying to type a nice detailed reply, which I junked to deal with the false start. Anyway, no need to follow up in the issue. As mentioned, your implementation doesn't align with Lodash's expected behavior.

jdalton on 27 Sep 2017

@jdalton Issues are mediated through edits and comments. GitHub is all about contribution and code sharing. Locking people out because you think no need to follow up in the issue is the wrong sentiment, imo, and a discouraging one at that.

And the guidelines for use of locks on GitHub is to deal with conversations that stretch too far (outside of the subject) or got too many comments.

vitaly-t on 27 Sep 2017

@ianstormtaylor I bet you didn't expect this much thrust in this issue :smile:

Any thoughts are welcome! :wink:

vitaly-t on 27 Sep 2017

Created brunch nested-names to prepare a PR for the feature.

vitaly-t on 27 Sep 2017

Now I'm trying to figure out how to amend the RegEx to be able correctly identify valid variables: https://stackoverflow.com/questions/46442884/regex-for-javascript-variables-with-nested-names

It seems the RegEx update would be more complex than I initially thought.

vitaly-t on 27 Sep 2017

The feature has been finished in brunch nested-names, pending test coverage to be added + documentation updates.

In the end I went for a much simpler and elegant solution that didn't require any RegEx updates, also one that performs best. The RegEx blindly accepts . in the name, but the post-match parser can detect and report any anomaly in the syntax, which is so much faster, simpler, and better - because you get an error thrown when the syntax is wrong.

UPDATE: Actually, there will be a number of updates for the helpers namespace before it is really ready.

vitaly-t on 27 Sep 2017

After much trying I am about to give up on the helpers namespace, as not only it is a huge overhead, but it creates logical anomalies that do not have a clear resolution.

Here's an example of such an anomaly...

const cs = new pgp.helpers.ColumnSet(['one.two']);

const data = {
    one: 123
};

const insert = pgp.helpers.insert(data, cs);

The problem is that property one in this example exists, but since its is not an object, name one.two cannot be defaulted to anything without overwriting the value of one.

So, executing either init or def logic will result in lost values because of the type contradiction. And that's a whole new ball game. I have no idea what to do with it.

I am tempted to do the limited feature within the formatting engine only, leaving out the helpers namespace entirely.

vitaly-t on 27 Sep 2017

I feel like this is attempting to remove a developers responsibility from verifying their own objects. I may be wrong there, but think about enforcing a developer to actually verify and then transform their object to a proper form.

By requiring the developer to do their transformation it means less maintenance since where they verify and update to de-nest objects is where they need to change things when the form changes, NOT the query or query formatting, where it is much more prone to problems in the first place and can only support a SINGLE version.

// inside your repo or route handler
function versionOne(obj){...} // format input object to be the same format always used by the query
function versionTwo(obj){...} // these will always return either an Object will all required fields or a false-y
function versionThree(obj){...} // by returning a false a test can be done that will basically say input invalid

function formatQueryObject(obj){
  switch obj.version
    case 'v1' :
      return versionOne(obj);
    case 'v2':
      return versionTwo(obj);
    case 'v3':
      return versionThree(obj);
    default:
      return false;
}

// inside your route handler
const queryObj = formatQueryObj(req.body);
if (!queryObj) return res.sendStatus(400);
req.app.locals.db.repo.method(queryObj)
  .then(res.sendStatus.bind(res,200))
  .catch(logErrorAndRespon(res))

By relying on the library to verify your object you're locking yourself to a single version. Proper design means handling ALL version separately and sending a single format that makes sense to your DB interaction layer. After all the DB interaction layer needs to be as fast as possible and as stable as possible. By not verifying your input and relying on the DB interaction layer to do such you remove yourself from a good design pattern and will run into more error prone design in the future.

skatcat31 on 27 Sep 2017

👍1

@skatcat31 Yes, it is true. By allowing implicit name path resolutions, we are exposing ourselves to a situation when a property exists only partially, and that would provide poor diagnostics.

For example, we have property one.two.three, and only three doesn't exist, but the error from the parser will always be that property one.two.three doesn't exist.

Although I could push it further, and report partial-path issues, I'm not sure it's all a good idea.

Anyhow, I will be releasing the feature for the formatting engine only, but not for the helpers namespace, because of the next-level logical problems it creates, as described earlier.

vitaly-t on 27 Sep 2017

😕1

As a follow up when looking at nested object versus single layer objects, you're still locked into a single query which means for maintenance reasons you'd then be making a transformer into a then nested object. So either way there is going to be a transformer for your object before it goes into your query if it's a complex object, so why complicate the library more? It's all dependent on the developer in either case.

skatcat31 on 27 Sep 2017

@vitaly-t I feel like this is going to lead to a bad maintenance paradigm though for some people who just keep changing the query over and over again instead of the logic before the query, but TBH that's a problem that can exist with named values anyways.

Name spacing is a helpful enoguh feature for remember where it came from and that may be enough to warrant the inclusion in the formatting engine. After all if your variable name is long enough that an indexof check on the string causes a performance hit...

skatcat31 on 27 Sep 2017

@skatcat31 As I wrote in the chat earlier, I do not want to be the judge of this anymore, let developers take the responsibility.

My objective was to make sure that performance is not affected, which I did achieve nicely. Regular variable references are not affected. It is only if you are using nested properties, the performance will be slightly worse, but only by a tiny margin, negligible, really.

vitaly-t on 27 Sep 2017

👍1

Released the feature as v6.10.0

vitaly-t on 27 Sep 2017

👍3

@ianstormtaylor Not a single comment? You asked for this feature.

vitaly-t on 28 Sep 2017

Thanks for your work.

I've decided to use a different solution that involves a separate formatting library, and to migrate to node-postgres instead since I realized it supports promises now and I'm not gaining that much from pg-promise at this point.

ianstormtaylor on 28 Sep 2017

@ianstormtaylor well, good luck!

Just so, promises aren't gonna help you manage connections or do proper transactions or high-performance query generation, etc... If you think this library just adds promises to the driver, you haven't really used it then. Not to mention the promise-related issues in the driver.

vitaly-t on 28 Sep 2017

This is very helpful. Thanks a lot @vitaly-t .