Objection.js: How to order records based on eager results?

Created on 11 Oct 2019  路  24Comments  路  Source: Vincit/objection.js

I have these tables:

  1. Users
  2. Skills
  3. Has_skills (user_id, skill_id)

I'm passing an array of skill IDs to the function which should fetch users who have at least one of those skills. How can the query sort the records by the number of matched skills from most matched skills to least? I assume I should additionally tweak the modifyEager on has_skills and count it, but I am not entirely sure how to do that.

This is my function:

async function getUsersWithPassedSkillIds({ skillIds }) {
    const users = await User.query()
        .select('users.id', 'users.name')
        .joinEager('has_skills')
        .modifyEager('has_skills', builder => builder.select('id', 'name'))
        .whereIn('has_skills.id', skillIds)

    return users
}

Most helpful comment

Thank you once again for this huge help. When I finish my project, I'll make sure to let people know how awesome this community is.

Thank you :bowing_man: :beers:

All 24 comments

Easiest solution is to sort them in javascript. If you aren't paging, theres very little benefit in sorting them in sql.

Easiest solution is to sort them in javascript. If you aren't paging, theres very little benefit in sorting them in sql.

Actually yes, pagination is the main thing I'm trying to accomplish here, but I didn't want to make the topic too wide. I shall make a new issue explaining everything that I'm trying to achieve, if that's okay or I can edit this one?

Edit: Here's a better explanation of what I need to accomplish: #1513

In that case, joining is not going to work and subqueries are better. Something like this should work

    const users = await User.query()
        .select('users.id', 'users.name')
                .whereExists(User.relatedQuery('has_skills').whereIn('id', skillIds))
                .orderBy(User.relatedQuery('has_skills').whereIn('id', skillIds).count())

This query doesn't eager-load the has_skills objects to the javascript side. You can use the normal eager method to load them if you need them. Don't use joinEager with this one.

Even cleaner:

        // Note that the query here isn't `await`ed. We don't execute this query. It will
        // be compiled as a part of the parent query.
        const hasSkillsSubquery = () => User.relatedQuery('has_skills').whereIn('id', skillIds)

    const users = await User.query()
        .select('users.id', 'users.name')
                .whereExists(hasSkillsSubquery())
                .orderBy(hasSkillsSubquery().count())

Oh, it seems that knex doesn't fully support that orderBy subquery. you need to explicitly wrap it in parentheses:

        // Note that the query here isn't `await`ed. We don't execute this query. It will
        // be compiled as a part of the parent query.
        const hasSkillsSubquery = () => User.relatedQuery('has_skills').whereIn('id', skillIds)

    const users = await User.query()
        .select('users.id', 'users.name')
                .whereExists(hasSkillsSubquery())
                .orderByRaw('?', hasSkillsSubquery().count())

The subquery is a function, because objection queries are mutable. We don't want the count from the orderBy to end up in the whereExists subquery. Another option is to use clone()

        // Note that the query here isn't `await`ed. We don't execute this query. It will
        // be compiled as a part of the parent query.
        const hasSkillsSubquery = User.relatedQuery('has_skills').whereIn('id', skillIds)

    const users = await User.query()
        .select('users.id', 'users.name')
                .whereExists(hasSkillsSubquery.clone())
                .orderByRaw('?', hasSkillsSubquery.clone().count())

I don't know why this happens, but running that function gives me this error:
_error: table name "has_skills" specified more than once_

Which db are you using?

Which db are you using?

PostgreSQL.

Weird. Could you show the complete query you are running?

module.exports = async function getUsersWithRequiredSkillsSortedForProject({ skillIds }) {
    const hasSkillsSubquery = User.relatedQuery('has_skills').whereIn('id', skillIds)

    const usersWithRequiredSkills = await User.query()
        .select('users.id', 'users.name')
        .whereExists(hasSkillsSubquery.clone())
        .orderByRaw('(?)', hasSkillsSubquery.clone().count())


    return usersWithRequiredSkills
}

Complete error:

{ error: table name "has_skills" specified more than once
        at Connection.Object.<anonymous>.Connection.parseE (C:\Users\krcky\Desktop\repos\Collab_Slim\node_modules\pg\lib\connection.js:604:11)
        at Connection.Object.<anonymous>.Connection.parseMessage (C:\Users\krcky\Desktop\repos\Collab_Slim\node_modules\pg\lib\connection.js:401:19)
        at Socket.<anonymous> (C:\Users\krcky\Desktop\repos\Collab_Slim\node_modules\pg\lib\connection.js:121:22)
        at Socket.emit (events.js:189:13)
        at addChunk (_stream_readable.js:284:12)
        at readableAddChunk (_stream_readable.js:265:11)
        at Socket.Readable.push (_stream_readable.js:220:10)
        at TCP.onStreamRead [as onread] (internal/stream_base_commons.js:94:17)
      name: 'error',
      length: 184,
      severity: 'ERROR',
      code: '42712',
      detail: undefined,
      hint: undefined,
      position: undefined,
      internalPosition: undefined,
      internalQuery: undefined,
      where: undefined,
      schema: undefined,
      table: undefined,
      column: undefined,
      dataType: undefined,
      constraint: undefined,
      file:
       'd:\\pginstaller.auto\\postgres.windows-x64\\src\\backend\\parser\\parse_relation.c',
      line: '420',
      routine: 'checkNameSpaceConflicts' }

Could you add .debug() to that query and share the log entry? I just wrote the same query in a test and it works.

Log:

{ method: 'select',
      options: {},
      timeout: false,
      cancelOnTimeout: false,
      bindings:
       [ 2451, 2452, 2453, 2454, 2455, 2451, 2452, 2453, 2454, 2455 ],
      __knexQueryUid: '155801e0-e1e5-4e65-b920-7176c0d6ff4e',
      sql:
       'select "users"."id", "users"."name" from "users" where exists (select "has_skills".* from "skills" as "has_skills" inner
join "has_skills" on "has_skills"."id" = "has_skills"."skill_id" where "has_skills"."user_id" = "users"."id" and "id" in (?, ?, ?, ?, ?)) order by ((select count(*) from "skills" as "has_skills" inner join "has_skills" on "has_skills"."id" = "has_skills"."skill_id" where "has_skills"."user_id" = "users"."id" and "id" in (?, ?, ?, ?, ?)))' }

Could you show the relation mapping of the has_skills relation? The query seems really weird

You have selected the same name for the relation and the join table. That's the problem here.

A better name for that relation would be skills

A tiny tweak, since I was getting this error:

_> error: column reference "id" is ambiguous_

Edited query:

const hasSkillsSubquery = User.relatedQuery('skills').whereIn('skills.id', skillIds) // 'skills.id' added instead of just 'id'

const usersWithRequiredSkills = await User.query()
    .select('users.id', 'users.name')
    .whereExists(hasSkillsSubquery.clone())
    .orderByRaw('(?)', hasSkillsSubquery.clone().count())

return usersWithRequiredSkills

Pardon my lack of knowledge, I hope I am not being annoying, but I definitely am grateful for your help and this amazing library that you made.

Where in the .orderByRaw part should I add the 'DESC' attribute so it reverses the order?

After that, I will try to add pagination to see how that works. Will this query, when given a range or offset (I need to take a deeper look at pagination), start counting that range from the users with most matched skills or some other factor? In other words, I just want the pagination to apply starting from the users with the most matched skills from the query.

Thank you once again for this huge help. When I finish my project, I'll make sure to let people know how awesome this community is.

This should work:

const hasSkillsSubquery = User.relatedQuery('skills').whereIn('skills.id', skillIds)

const usersWithRequiredSkills = await User.query()
    .select('users.id', 'users.name')
    .whereExists(hasSkillsSubquery.clone())
    .orderByRaw('? DESC', hasSkillsSubquery.clone().count())

return usersWithRequiredSkills

You could also open an issue in knex's github about orderBy not supporting a subquery. It should definitely work without orderByRaw,

And about paging, yes it should work. I'm not 100% sure. Try it out?

Thank you once again for this huge help. When I finish my project, I'll make sure to let people know how awesome this community is.

Thank you :bowing_man: :beers:

const hasSkillsSubquery = User.relatedQuery('skills').whereIn('has_skills.skill_id', skillIds)
const blockedUserIdSubquery = User.relatedQuery('blockedMembers').where('blocked_members.target_id', userId)

const users = await User.query()
    .select('users.id', 'users.name')
    .joinEager('skills')
    .modifyEager('skills', builder => builder.select('id', 'name').whereIn('id', skillIds)) // Populating the matched skills
    .whereExists(hasSkillsSubquery.clone()) // Only taking into account users who have passed skills
    .whereNotExists(blockedUserIdSubquery.clone()) // Skipping users who blocked userId
    .whereNot('users.id', userId) // Skipping userId (the user hitting this API)
    .whereNotIn('users.id', blockedUsersIdsArr) // Skipping users which are blocked from userId
    .orderByRaw('(?) DESC', hasSkillsSubquery.clone().count())
    .page(0, 5)

Pagination works to an extent, but either I am not getting it in the first place or it's simply the way it should work. Depending on the values on pagination methods, I get different results for how many skills for each user are being counted as much.

I tried with both .range() and .page(). I would say that both methods are also counting the eager queries (skills). I don't know what's happening in the background, so I'll try to illustrate it down below.

First, I get correct results if my .page() or .range() is big enough, for example: .page(0, 100)

Results:

[
  {
    id: 2528,
    name: "usertwo",
    skills: [[Object], [Object], [Object], [Object], [Object]],
  },
  {
    id: 2522,
    name: "userthree",
    skills: [[Object], [Object], [Object], [Object]],
  },
  {
    id: 2523,
    name: "userfour",
    skills: [[Object], [Object], [Object]],
  },
  {
    id: 2524,
    name: "userfive",
    skills: [[Object], [Object]],
  },
  { id: 2525, name: "usersix", skills: [[Object]], }
];

If I try something like this .page(0,1):

[{ id: 2536, name: "usertwo", skills: [[Object]], }]; // should be 5 skills

As you can see, it got the user correct, but number of matched skills for the user is not correct. User two is indeed the one with the most matched skills, but they have more than 1 matched skill, they have 5 matched skills.

.page(0,2):

[{ id: 2544, name: "usertwo", skills: [[Object], [Object]], }]; // should be 5 skills, not 2

.page(0,8):

[
  {
    id: 2552,
    name: "usertwo",
    skills: [[Object], [Object], [Object], [Object], [Object]],
  },
  {
    id: 2546,
    name: "userthree",
    skills: [[Object], [Object]], // should be 4 skills
  }
];

As the page range gets bigger, the more accurate are the number of matched skills. I will certainly run into problems if I leave the query like this. I will not know in advance which is the correct range for getting accurate results.

What is the way for doing the pagination for this query accurately?

I was expecting that .range(0,2) or .page(0, 3) would return 3 users with most matched skills:

[
  {
    id: 2560,
    name: "usertwo",
    skills: [[Object], [Object], [Object], [Object], [Object]],
  },
  {
    id: 2554,
    name: "userthree",
    skills: [[Object], [Object], [Object], [Object]],
  },
  {
    id: 2555,
    name: "userfour",
    skills: [[Object], [Object], [Object]],
  }
];

Paging doesn't work with joinEager because it uses joins. You need to use the eager method instead

Ah, stupid me. I read the documentation, but I usually use joinEager so it's stuck in the muscle memory. It seems to work perfectly fine now! Thank you so much, I thought I'd be stuck with this problem for a long time!

Was this page helpful?
0 / 5 - 0 ratings