Ecto: `count(<field>)` can sometimes be much slower than `count(*)`

Created on 18 Jun 2018  Â·  10Comments  Â·  Source: elixir-ecto/ecto

Environment

  • Database and version (PostgreSQL 9.4, MongoDB 3.2, etc.): PostgreSQL (possibly other SQL DBs, unsure about NoSQLs)
  • Ecto version (mix deps): 2.0+

Current behavior

Currently there is no way (other than using fragment) to run query like count(*). This can be problematic as count(<field>) on <field> that isn't indexed can be terribly slow (found by @mkaszubowski). The reason why this is happening is that by SQL standard COUNT(<field>) should return count of non-null values in DB so that mean it need to check wether any value in given field is empty or not, while COUNT(*) do not have this limitation and always return count of all matched rows.

Currently there is available workaround in form of count(true) in Ecto, but this isn't obvious solution and for sure isn't idiomatic.

Expected behavior

There should be count/0 function that would expand to counting all entries matching the query which I believe that most DBs have optimised enough.

Enhancement Intermediate

Most helpful comment

Just to clarify: the problem exists even if the fields used in count is indexed properly. The difference between two approaches is that when using count(<field>), PostgreSQL uses an index scan and still has to fetch the rows. With count(*) it can use index only scan without fetching the rows, which is way faster.

All 10 comments

Sounds good to me! PR please?

Just to clarify: the problem exists even if the fields used in count is indexed properly. The difference between two approaches is that when using count(<field>), PostgreSQL uses an index scan and still has to fetch the rows. With count(*) it can use index only scan without fetching the rows, which is way faster.

I have a closed pr about count function #2575 if you want i can modify for that situation.

@soteras a PR is definitely welcome. Although note that we are not adding new callbacks to the repo module. The new count function will belong to the Ecto.Query.API instead. Please let me know if you have any questions!

@josevalim any sources where I could learn how are aggregate functions in Ecto.Query.API handled? Because it seems a little convoluted (as most macro systems).

There are two steps:

  • First is in Ecto.Query.Builder where expressions are expanded/compiled to query structs. This step verifies all expressions are valid and supported
  • Second is conversion to SQL and that happens in each adapter separately. For SQL adapters, the module that does it is usually named <adapter>.Connection.

There is a step in the middle, which is the Ecto.Query.Planner. My suggestion is to search the codebase for a less commonly used aggregator, such as avg, and see where in the codebase we touch it.

@hauleth are you doing that or wants to?

I want to, but if you are willing then you can try as I do not have time right now to dig into it properly.

Wiadomość napisana przez Carlos Henrique notifications@github.com w dniu 28.06.2018, o godz. 23:52:

@hauleth https://github.com/hauleth are you doing that or wants to?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/elixir-ecto/ecto/issues/2595#issuecomment-401184727, or mute the thread https://github.com/notifications/unsubscribe-auth/AARzNyw55jCtOSgk1HJRE3qYGkZ1fhKwks5uBVA4gaJpZM4UrhdQ.

It seem it was merged. @soteras thanks for your work.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nathanjohnson320 picture nathanjohnson320  Â·  4Comments

jbence picture jbence  Â·  3Comments

atsheehan picture atsheehan  Â·  4Comments

kelostrada picture kelostrada  Â·  3Comments

yordis picture yordis  Â·  4Comments