Sails: [Feature Request] populate indexes for has-Many associations

Created on 17 Jul 2014  Â·  40Comments  Â·  Source: balderdashy/sails

I want to propose a method that would populate hasMany associations with an array of primary keys instead of fully fetched objects.

_Reason why?_

  1. Performance: Using relational databases we only have to look up the keys in the join table without performing an expensive full join
  2. Standards: A lot of popular JSON API implementations accept or even expect relationships to be expressed as an array of primary keys, see JSON API specs or Ember Data's RESTAdapter

_Current situation:_
When using associations in Sails / Waterline and we query for records, we currently have two choices:
with populate: associated records are retrieved and embedded
without populate: belongsTo (model) associations are represented by their primary key and hasMany (collection) associations are ignored (in fact the JSON representation is even missing the properties)

_Use case:_
In order to build a performant solution for sideloading records, I want to query my Models with associations included as primary keys. Then I want to filter duplicated associations before fetching the full records for the associations.

Example:
I would like Group.find().populate("members") to return something like { topic: "...", members: [1,2,3,4] } instead of the fully fetched objects.

_I am not sure if Waterline is already optimizing for duplicated records, but if it is, I couldn't find any exposure of this functionality to tap into._

Most helpful comment

@jcowley Nice to hear, that the blueprints are still appreciated and honestly I feel kinda sad, that I don't have time to work on them, but I've moved away from Sails and Node as a backend for the time being. One of the main reasons for that were issues like the one we're talking about here, which I feel are important for a strong API story.

All 40 comments

+1 I second this one, I'm on the same situation

:+1: * :100:

+1

Any ideas where to begin? I assume this would have to be implemented in Waterline and all of the adapters?

@particlebanana Could you help out with a broad description where stuff would have to go? I'm thinking of starting a PR for this...
Thanks!

This would touch a lot of pieces. I would suggest looking into https://github.com/balderdashy/waterline/blob/master/lib/waterline/query/deferred.js#L90 as a start. Might be best to use something other than populate.

Hello,
For Ember Data users: since 1.0 beta9, Ember Data supports embedded records, which makes it much more Waterline-friendly.
http://emberjs.com/blog/2014/08/18/ember-data-1-0-beta-9-released.html

This would be a useful feature; Backbone Relational also expects this.

Moving to the feature-request milestone.

+1 because this would make caching in my current angular project A LOT easier.

This is far from ideal, but a hack would be to override toJSON in a model where you want to always return an array of the association ids:

toJSON: function() {
  var obj = this.toObject();

  // get association-attributes
  var associations = [];
  var attributes = Object.keys(obj)
  for (var i = 0; i < attributes.length; i++) {
    var attribute = obj[attributes[i]];
    if ((attribute !== null && typeof attribute === 'object')
        && attribute.constructor === Array)
      associations.push(attributes[i]);
  }

  // for each association, make array of ids
  for (var j = 0; j < associations.length; j++) {
    obj[associations[j]] = obj[associations[j]].map(function(rel) {
      return rel.id;
    });
  }

  return obj;
}

Using this would require you to populate any to-many associations in your queries.

@mphasize I think handling the association should go exactly after that _.each here which copies the attributes (but does nothing to associations).

Anyone knows if some work had already been done on this issue or not?

cc @particlebanana

I am guessing this would simplify the deep populate, since foreign keys of 2nd level associations would be available.. it will become a recursion of populate much simpler than the current setup.

isn't this related to (and would be closed by) the toObject issue https://github.com/balderdashy/waterline/issues/317#issuecomment-87274671

@mphasize, @anasyb, I like this. :+1:

Have you thought about how to actually request find() to only provide the fkeys? Something like:

Group.find().populate("members", { fKeys: true } );

// or

Group.find().populateKeys("members");

Comes to mind//. I'm not a 100% sure if this won't need changes or additions in the adapter API to make it more performant.

I definitely vote for this. This will solve multiple use cases.
Traversing relationships is a pain at the moment.. populating the whole record just to get the PK of a relationship is too much. Not to mention the freaky performance issue https://github.com/balderdashy/waterline/issues/944 that I never got the chance to look into. I had to access the join collections directly at multiple occasions (yuk!).
I was thinking more of the second notation Group.find().populateKeys("members");. For the first notation I think that it should be skipped in favor of https://github.com/balderdashy/waterline/pull/952 ... which on the other hand should only access the join table in case the only picked attribute is the id.

@anasyb, great point regarding skipping the first notation in favor of select/pick which I agree with. Lets stick with the below if no one disagrees:

Group.find().populateKeys("members");

I like this. I'd just like to add, it'd be nice to see adapters dealing with this in an efficient way. For example, sequel-interfaced adapters could get away with one join in many-to-many relationships rather than two. Any suggestions how to introduce a shim for this feature within waterline and simultaneously allow adapters to provide their own performant solutions?

Totally agree! I think this should be done in the same manner as .populate() / adapter.join() currently works:

  • If the adapter doesn't provide a custom .joinKeys() (or whatever) waterline core will provide the results, probably using some subset logic of the existing .populate;
  • If the adapter implements .joinKeys() waterline will call it.

Okeedokes, sounds good!

Couldn't agree more and I'm also voting for the .populateKeys() style api. :+1:

Hello guys, any news about the .populateKeys feature?

Best,

i have the same question, any news about the .populateKeys feature ?

Thanks for posting, @mphasize. I'm a repo bot-- nice to meet you!

It has been 30 days since there have been any updates or new comments on this page. If this issue has been resolved, feel free to disregard the rest of this message. On the other hand, if you are still waiting on a patch, please:

  • review our contribution guide to make sure this submission meets our criteria (only _verified bugs_ with documented features, please; no questions, commentary, or bug reports about undocumented features or unofficial plugins)
  • create a new issue with the latest information, including updated version details with error messages, failing tests, and a link back to the original issue. This allows GitHub to automatically create a back-reference for future visitors arriving from search engines.

Thanks so much for your help!

This is clearly desired, so I'm reopening until someone PRs this into the roadmap file. Anybody feel free!

this is interresting, not only for primaryKey ... it would be faster than a populate, i vote for :

model.find(1).populate('child', {pluck: 'id'})

Thanks for posting, @mphasize. I'm a repo bot-- nice to meet you!

It has been 30 days since there have been any updates or new comments on this page. If this issue has been resolved, feel free to disregard the rest of this message. On the other hand, if you are still waiting on a patch, please:

  • review our contribution guide to make sure this submission meets our criteria (only _verified bugs_ with documented features, please; no questions, commentary, or bug reports about undocumented features or unofficial plugins)
  • create a new issue with the latest information, including updated version details with error messages, failing tests, and a link back to the original issue. This allows GitHub to automatically create a back-reference for future visitors arriving from search engines.

Thanks so much for your help!

This should be re-opened (and that damned bot should be shut down).

I love sails, but this is my number one complaint. If you're exposing a REST API, you really need the one-to-one and many-to-one ids that are populated directly on the record. You don't need to populate the associated data because clients can then make a second API call to fetch the data for the association if and when they need it.

My current workaround is to populate the relationships twice on every model: once as a 'model' association (for use in custom controller logic) and again as an 'integer' so that the id gets populated without the enormous overhead of populating the associations.

Before I go and fork Waterline and try to hack together a fix, has anyone else solved this?

@jcowley you could use populate with select ['id'], but the way waterline and adapters are wrote doesn't let you populate childs ids without the "enormous overhead of populating the associations" ...
the problem is that sequel use union for populate and the cursor is too generic to be fast...
we forked waterline and wrote an adapter who use left join instead of union and a sql optimized cursor to optimize performance see https://github.com/Atlantis-Software/sails-hook-orm-offshore/issues/5

@atiertant: That's helpful, thanks. I'll check it out.

The thing that bothers me is that if you're only interested in the foreign keys on the record itself, you don't need to do a union or a join. There should be an optimization (like the default?) that allows you to forgo any population of relationships but still get the values of any foreign keys that are populated as columns on the records you fetch.

@jcowley in a hasMany relation, the foreign keys are in the other side, you can't select parent record and child pk on the same table, so union, join, or a second request...

@atiertant: Ah, I see now I posted to a topic that's specific to @hasMany relationships. As I mentioned, I'm just looking for the one-to-one or one-to-many ids, e.g.:

{ products: [
  {id: 1, category_id: 2, type_id, 5},
  {id: 1, category_id: 2, type_id, 5}
  ...
  ]
}

In the example, I'm talking about returning the category and type ids (one-to-many) without the overhead of joining those tables. I'm not even getting the ids back unless I have populate all on or manually populate them.

@jcowley one-to-one and many-to-one are already showing id without need to populate

@atiertant : I'm using @mphasize sails-generate-ember-blueprints for my blueprints (which is what led me to this issue). It's not populating the ids. Mistakenly thought it was a limitation of sails/waterline because of this issue, but it must be a bug in the blueprints I'm using.

@jcowley The sails-generate-ember-blueprints are pretty much out of date, I can't recommend to use them anymore. I added a custom populateIndexes function to the _actionUtils in conjunction with a hacky "many-to-many through" association described here https://gist.github.com/mphasize/aeaff696132357797e0b in order to make indexing work. Not sure if this is still compatible with the latest Sails.

@mphasize: Yeah, I keep seeing your comments on the blueprints being "out of date", but many of us are still happily using them! The reality is that there is no sails JSON API blueprint yet that's production ready. So I've stayed with the REST adapter in ember and your blueprints — and you can see from the download stats on npm that others have as well. (The benefits of upgrading to a JSON API don't really justify the additional work and risk at this point IMHO.)

Appreciate your creating those blueprints. With the exception of the foreign key issue and a another minor issue that I patched in my fork own of the blueprints, they've been great. And by the way, they do work fine with the latest version of sails (0.12.4).

Thanks for the tip on populateIndexes. I'll take a look!

@jcowley Nice to hear, that the blueprints are still appreciated and honestly I feel kinda sad, that I don't have time to work on them, but I've moved away from Sails and Node as a backend for the time being. One of the main reasons for that were issues like the one we're talking about here, which I feel are important for a strong API story.

In desperate need of having _only_ the ids in an array, we wrote a fairly efficient function to only get the ids via the junction table (a bit different to populateIndexes as we went directly for mongo native aggregate).

However, when trying to set these arrays of ids to the model's property, we still get the embedded records magically. @mphasize also ran into this back then:

// get rid of the record's prototype ( otherwise the .toJSON called in res.send would re-insert 
embedded records) record = create( {}, record.toJSON() );

_actionUtils

Now we clone all the parent records via parse/stringify (_.cloneDeep now clones functions as well!). This of course is a bit blocking once we get a number of bigger records to clone.

Is there a better way to set/modify association fields (one and many) without cloning them?
This would be an intermediate fix without that performance penalty.

You could query the association table/collection itself.

To get the list of association models that you can use through sails:
run sails console

Object.keys(sails.models)

You get the list of all the models, including association models. You can query them as any other model, and make the join yourself.

The query is not the problem. More setting the results to the model property.

Example:

  • Models Users and Pets, having a many-to-many relationship designed in their models, like in http://sailsjs.com/documentation/concepts/models-and-orm/associations/many-to-many.
  • We use our custom populateIds function to get all Pets for User "John". The result is something likevar populatedPetsIds = ["id1","id2","id3"];
  • When I try to set this result array to the association property of the User record of "John", it is not there when I log it to console or other functions where it is converted to JSON.

So, johnsUserRecord.pets = populatedPetsIds; does not work.

Instead, we do:

var clonedJohnsUserRecord = JSON.parse( JSON.stringify( johnsUserRecord )); // _.cloneDeep does now clone functions as well, so won't work
clonedJohnsUserRecord.pets =  populatedPetsIds;

That works of course, but doing this for 50 or more heavy records is blocking and more memory and GC-intense.

The issue why pets does not work is because it has associated getters and setters. You have to call toObject() on each object to remove that. And then you can assign anything to them. But, if you use model functions, then they will not work.

Was this page helpful?
0 / 5 - 0 ratings