On the analysis quota warning, consumption estimation shows a huge mismatch between rows and credits.
Related to https://github.com/CartoDB/camshaft/issues/238
Please break down here below all the needed steps to reproduce the issue
Quota notification shows a 510 credit consumption estimation

2 credit consumption estimation (or at least something close)
Please add any information of interest here below
I can't reproduce it in production, under my team account, could you give me more details?
try with this .carto:
quota test map (on 2017-03-13 at 18.14.41).zip
This seems to happen only in @noguerol's account. The explain query returns:
{"rows":[{"QUERY PLAN":"Seq Scan on untitled_table_4 (cost=0.00..11.53 rows=510 width=132)"}],"time":0.002,"fields":{"QUERY PLAN":{"type":"string"}},"total_rows":1}
with an isoline with only 1 tract.
any clue @rafatower?
@noguerol what's the username of the account you used?
edit never mind, I managed w/o it.
There's a difference between importing a table and populating an empty table with a couple of rows:
ANALYZE is run as part of the process.ANALYZE is not run after inserting the rows. That's why you cannot reproduce the issue by importing a .carto file.E.g: if you run this:
CREATE TABLE test_row_estimate_stats (my_attr text);
ANALYZE test_row_estimate_stats;
INSERT INTO test_row_estimate_stats (my_attr) VALUES ('foo'), ('bar');
EXPLAIN SELECT * FROM test_row_estimate_stats;
most likely you're going to get an unexpected number of rows:
QUERY PLAN
----------------------------------------------------------------------------
Seq Scan on test_row_estimate_stats (cost=0.00..11.83 rows=610 width=104)
(1 row)
From the top of my head, I see a few alternatives:
autovacuum_analyze_threshold. But this can have a negative effect on DB performance.ANALYZE "manually" upon insertion/deletion of features on a layer.Solutions two (to run the query after the geometry edition) and three (to make a direct count on an always simple layer) sound good to me.
I'm wondering how bad we could perform running the basic SELECT count(*) FROM whatever.
I ran some benchmarks here
There's something we have to keep in mind. The goal of PG stats is to help the query planner to make decisions.
There are a couple of corner cases in which EXPLAIN can return wrong estimates:
ANALYZE not run yetANALYZE is run or not, it will not keep any actual stats in pg_stats)This happens because PG tries to make some reasonable assumptions when true stats do not exist and the query planner needs them. Here is a thread about this matter. For the planner there is little difference if a particular table has 0 or 500 rows and many times it cannot accept 0 rows as an input.
A way to workaround several corner cases:
ANALYZE on the table.estimated_rows < N return an actual count.estimated_rows.N can be set to 1000 or 10000 based on the benchmarks. It's a judgment call between accuracy and speed.
@nobuti in our case, are we always requesting the row count estimate of a table or a query?
@rafatower yes, we always do request the estimation
I was not expecting a yes or no :sweat_smile:
Let me explain myself better: do we need to estimate a) tables or b) queries?
Sorry, always queries.
That's bad, cause it makes the problem much harder to solve :confused:
Just to confirm: is this query ultimately taken from the analysis model?
Yes, it comes from here.
For the record, that has some implications:
SELECT * FROM fresh_empty_table special case. To be honest I don't know how to do that at all :disappointed: Regarding the "fresh empty table" stats issue, it can be reproduced like this:
tests=# CREATE TABLE fresh_empty_table (foo text);
CREATE TABLE
tests=# EXPLAIN SELECT * FROM fresh_empty_table;
QUERY PLAN
----------------------------------------------------------------------
Seq Scan on fresh_empty_table (cost=0.00..14.08 rows=1360 width=32)
The root cause of it is in the function estimate_rel_size of file plancat.c (I reproduced it in 9.6 but is also present in 9.5 and in master):
curpages are set to 10 even though the relation occupies zero blocks: https://github.com/postgres/postgres/blob/18dc2aee5f303447bef48dee596a664d90f6939a/src/backend/optimizer/util/plancat.c#L915-L951*tuples = 0: https://github.com/postgres/postgres/blob/18dc2aee5f303447bef48dee596a664d90f6939a/src/backend/optimizer/util/plancat.c#L955-L961curpages) here: https://github.com/postgres/postgres/blob/18dc2aee5f303447bef48dee596a664d90f6939a/src/backend/optimizer/util/plancat.c#L1008I created a couple of issues:
EXPLAIN and parsing the results from frontend but it's pretty important to agree on the interfaces).@rafatower LGTM 馃挴
Blocking until this is ready: https://github.com/CartoDB/cartodb-postgresql/issues/295
https://github.com/CartoDB/cartodb-postgresql/issues/295 is done so I guess we can unblock this ticket by starting using CDB_EstimateRowCount(query text) from the FE code for estimations.
Works like a charm! Thanks @rafatower!
All the merit should go to @jgoizueta :)
You both @rafatower and @jgoizueta rulez!
Tested in RUIs and the issue seems to be fixed with last changes. \o/, as @nobuti mentioned.
Deployed!
OOOOH!
Most helpful comment
All the merit should go to @jgoizueta :)