Graphql-engine: Performance gets affected when using 'limit' as it applies limit after the table join

Created on 8 Sep 2020  路  3Comments  路  Source: hasura/graphql-engine

Summary

Since 'limit' is getting applied after the table joins, the performance is greatly affected. If the limit for the first table is applied and then the table join happens, it would greatly improve the performance.

Steps to reproduce

Sample database schema


Click to expand!

workmanage.sql.zip

GraphQL


Click to expand!

query MyQuery {
  users(limit: 10, offset: 0) {
    works {
      id
      name
      updated_at
      created_at
    }
    name
    id
    is_active
    updated_at
    username
    created_at
  }
}

Generated SQL


Click to expand!

SELECT
  coalesce(json_agg("root"), '[]') AS "root"
FROM
  (
    SELECT
      row_to_json(
        (
          SELECT
            "_5_e"
          FROM
            (
              SELECT
                "_4_root.ar.root.works"."works" AS "works",
                "_0_root.base"."name" AS "name",
                "_0_root.base"."id" AS "id",
                "_0_root.base"."is_active" AS "is_active",
                "_0_root.base"."updated_at" AS "updated_at",
                "_0_root.base"."username" AS "username",
                "_0_root.base"."created_at" AS "created_at"
            ) AS "_5_e"
        )
      ) AS "root"
    FROM
      (
        SELECT
          *
        FROM
          "public"."users"
        WHERE
          ('true')
      ) AS "_0_root.base"
      LEFT OUTER JOIN LATERAL (
        SELECT
          coalesce(json_agg("works"), '[]') AS "works"
        FROM
          (
            SELECT
              row_to_json(
                (
                  SELECT
                    "_2_e"
                  FROM
                    (
                      SELECT
                        "_1_root.ar.root.works.base"."id" AS "id",
                        "_1_root.ar.root.works.base"."name" AS "name",
                        "_1_root.ar.root.works.base"."updated_at" AS "updated_at",
                        "_1_root.ar.root.works.base"."created_at" AS "created_at"
                    ) AS "_2_e"
                )
              ) AS "works"
            FROM
              (
                SELECT
                  *
                FROM
                  "public"."works"
                WHERE
                  (("_0_root.base"."id") = ("user_id"))
              ) AS "_1_root.ar.root.works.base"
          ) AS "_3_root.ar.root.works"
      ) AS "_4_root.ar.root.works" ON ('true')
    LIMIT
      10 OFFSET ('0') :: integer
  ) AS "_6_root"

server enhancement high

Most helpful comment

I'm also experiencing this problem. I copied the query from the "analyze" page and moved limit to the "main" select statement, and the query time went from about ~20 seconds to less than a second.

All 3 comments

Hi @reidharry, the limit is currently added at the outer level because the ordering could use columns from relationships. Postgres's query planner should be able to push the limit into the inner table automatically? What does the explain output say? Either way, we'll definitely improve this - limit will be added at the right level based on the order_by clause.

I'm also affected by this - queries that are quite fast with the limit placed on the inner query are extremely slow when the limit is on the outer, fully joined product - even when no part of the join is used in the sorting or filtering.

I'm also experiencing this problem. I copied the query from the "analyze" page and moved limit to the "main" select statement, and the query time went from about ~20 seconds to less than a second.

Was this page helpful?
0 / 5 - 0 ratings