In #234 there were proposal for adding Repo.exists?/1 function that will check if given query will result with any data.
I think that now it is good idea to reopen that question as Ecto stabilised a little, we have adapters for some NoSQLs and there is Ecto 3.0 around the corner. However mine "SQL" solution is a little bit different from the original
def exists?(query) do
import Ecto.Query
result =
query
|> exclude(:select)
|> limit(1)
|> select(1)
|> one()
not is_nil(result)
end
Can you provide examples of where exists?/1 would be useful? In my mind, most cases I think about using exists? I would still need the data later... or even worse, I would check if something associated exists, leading to N+1 queries.
Here on Xerpa, we have a couple of places where we need to check if something exists (but do not work with the data); right now we are using Repo.aggregate(query, :count, :id) and if id == 0
For example, we wanted to disallow closing/archiving the treatment of the employees work period if there was an older work period open to be treated and closed. For this, we didn't want to retrieve any particular data, just to know if there existed anything matching the query.
We have stumbled upon this several times already. A Repo.exists? or something like that would prove to be of use to us (naturally we could also implement this as a helper over Repo.aggregate on our repo module)
@josevalim in my case it is that I need to run slightly different query depending on given user position (geoquery). And I cannot select data and detect by it as there is also pagination. So only solution is to always run check if there is something in given area and then build proper request basing on wether there is something to query at all.
@mememori would the implementation above also work at Xerpa?
Blog post about even more optimised query using EXISTS instead of LIMIT or COUNT, with cons of using COUNT approach.
@hauleth my concern is how would we make it work. We don't support queries without a from and we don't support exists expressions. :(
@josevalim I have tested performance in PosgreSQL using different syntaxes.
Script:
CREATE TEMPORARY TABLE
tmp AS SELECT
generate_series(1,1000000) AS id,
round(random() * 100)::integer AS value;
\echo '========== WITHOUT INDEX ==========\n\n'
\echo 'COUNT'
EXPLAIN ANALYZE SELECT COUNT(*) FROM tmp WHERE value BETWEEN 40 AND 60;
\echo '\nLIMIT 1'
EXPLAIN ANALYZE SELECT 1 FROM tmp WHERE value BETWEEN 40 AND 60 LIMIT 1;
\echo '\nEXISTS'
EXPLAIN ANALYZE SELECT EXISTS (SELECT 1 FROM tmp WHERE value BETWEEN 40 AND 60);
\echo '\n\n'
\echo '========== WITH INDEX ==========\n\n'
CREATE INDEX ON tmp (value);
\echo 'COUNT'
EXPLAIN ANALYZE SELECT COUNT(*) FROM tmp WHERE value BETWEEN 40 AND 60;
\echo '\nLIMIT 1'
EXPLAIN ANALYZE SELECT 1 FROM tmp WHERE value BETWEEN 40 AND 60 LIMIT 1;
\echo '\nEXISTS'
EXPLAIN ANALYZE SELECT EXISTS (SELECT 1 FROM tmp WHERE value BETWEEN 40 AND 60);
Results:
SELECT 1000000
Time: 908,841 ms
========== WITHOUT INDEX ==========
COUNT
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Aggregate (cost=19438.25..19438.26 rows=1 width=8) (actual time=383.055..383.055 rows=1 loops=1)
-> Seq Scan on tmp (cost=0.00..19425.75 rows=5000 width=0) (actual time=0.117..339.119 rows=209917 loops=1)
Filter: ((value >= 40) AND (value <= 60))
Rows Removed by Filter: 790083
Planning time: 0.656 ms
Execution time: 383.197 ms
(6 rows)
Time: 623,812 ms
LIMIT 1
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Limit (cost=0.00..3.89 rows=1 width=4) (actual time=0.050..0.051 rows=1 loops=1)
-> Seq Scan on tmp (cost=0.00..19425.75 rows=5000 width=4) (actual time=0.049..0.049 rows=1 loops=1)
Filter: ((value >= 40) AND (value <= 60))
Planning time: 0.090 ms
Execution time: 0.087 ms
(5 rows)
Time: 0,566 ms
EXISTS
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Result (cost=3.89..3.90 rows=1 width=1) (actual time=0.018..0.018 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Seq Scan on tmp (cost=0.00..19425.75 rows=5000 width=0) (actual time=0.016..0.016 rows=1 loops=1)
Filter: ((value >= 40) AND (value <= 60))
Planning time: 0.089 ms
Execution time: 0.059 ms
(6 rows)
Time: 0,400 ms
========== WITH INDEX ==========
CREATE INDEX
Time: 1721,912 ms (00:01,722)
COUNT
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=3321.93..3321.94 rows=1 width=8) (actual time=905.761..905.761 rows=1 loops=1)
-> Index Only Scan using tmp_value_idx on tmp (cost=0.42..3309.43 rows=5000 width=0) (actual time=0.110..841.317 rows=209917 loops=1)
Index Cond: ((value >= 40) AND (value <= 60))
Heap Fetches: 209917
Planning time: 0.435 ms
Execution time: 905.842 ms
(6 rows)
Time: 906,816 ms
LIMIT 1
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..1.09 rows=1 width=4) (actual time=0.082..0.082 rows=1 loops=1)
-> Index Only Scan using tmp_value_idx on tmp (cost=0.42..3309.43 rows=5000 width=4) (actual time=0.081..0.081 rows=1 loops=1)
Index Cond: ((value >= 40) AND (value <= 60))
Heap Fetches: 1
Planning time: 0.092 ms
Execution time: 0.121 ms
(6 rows)
Time: 0,491 ms
EXISTS
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Result (cost=1.09..1.10 rows=1 width=1) (actual time=0.049..0.049 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Index Only Scan using tmp_value_idx on tmp (cost=0.42..3309.43 rows=5000 width=0) (actual time=0.047..0.047 rows=1 loops=1)
Index Cond: ((value >= 40) AND (value <= 60))
Heap Fetches: 1
Planning time: 0.150 ms
Execution time: 0.076 ms
(7 rows)
Time: 0,663 ms
So it seems that for now LIMIT 1 version is Good Enoughâ„¢ and in future it could be fixed when we gain support for EXISTS and/or FROM-less queries.
EDIT: @mememori if you are using PostgreSQL then I would encourage you to check this out and change your Repo.aggregate(Foo, :count, :id) to other solution as COUNT() in PSQL can be terribly slow.
Thank you @hauleth!
To clarify, my concern is that the query one is more generic. For example, we decided to add last to Ecto.Query rather than Ecto.Repo because it composes better with other things, such as subqueries and associations. So I think having some sort of exists in Ecto.Query would be nice but it is unclear how to support it.
I would like to move ahead with this but I am not sure how to best expose the exists interface. Adding a function to the adapter seems like the best choice for now: Ecto.Repo.exists? will add LIMIT 1 and SELECT 1 to the query and return a boolean.
/cc @jeregrine
@josevalim when would this function be published on hex.pm?
This will ship as part of Ecto 3. There's no release date set, the progress can be tracked with GitHub milestone.
Just ran into a use case where this was required. Looking forward to seeing it implemented.
Most helpful comment
@josevalim I have tested performance in PosgreSQL using different syntaxes.
Script:
Results:
So it seems that for now
LIMIT 1version is Good Enoughâ„¢ and in future it could be fixed when we gain support forEXISTSand/orFROM-less queries.EDIT: @mememori if you are using PostgreSQL then I would encourage you to check this out and change your
Repo.aggregate(Foo, :count, :id)to other solution asCOUNT()in PSQL can be terribly slow.