We have to make sure the New Dashboard performs ๐ in accounts with loads of maps and datasets. @andy-esch recently explained how the performance of the actual Dashboard is pretty bad in his account (~1000 datasets).
The goal of this issue is to assess what the performance of the New Dashboard is in accounts like @andy-esch 's and do a little investigation of what's causing the UI to be slow, so that we can decide how to fix some or all of them in the near future.
A more performant Dashboard will be add a lot of value to heavy CARTO users ๐ช
cc: @javitonino @ethervoid
Randomly came across this which should help: https://github.com/CartoDB/cartodb/pull/14621
Some more info that might be relevant: https://github.com/CartoDB/cartodb/issues/13953
Another place where with_presenter_cache could help:
https://github.com/CartoDB/cartodb/blob/32d30ba66b8006dca14be926fdc8ccfaa092e40b/app/controllers/carto/api/visualization_presenter.rb#L61
@jesusbotella sent me this issue about the (poor) performance of the datasets view:
https://groups.google.com/forum/#!msg/cartodb/K7dG4p0Tpyw/NP5ncVshCQAJ
Similar to what @andy-esch was expressing. I see @javitonino already replied. If we can fix this in the New Dashboard, it will be a super-win for our users ๐ช
@ethervoid do you have any insights regarding this so far? Thanks!
@alonsogarciapablo I'm reading all the issues, understanding the measures taken an so on. I finished yesterday the Likes cleaning so I'm on it :)
Perfect! Thanks for the update! ๐ช
I've been uploading and trying to reproduce the problem in my local environment but is not the same and I don't have the same latency problems that in production. Why?
Then I tested it in staging cloud with similar results. It could be due to similar factors than the local environment.
Next step is to provision a new RUI, redirect @andy-esch user to that rui and start doing profiling and so on.
For example doing this request: https://team.carto.com/u/eschbacher/api/v1/viz?tag_name=&q=&page=1&type=derived&exclude_shared=false&per_page=1&locked=true&shared=no&only_liked=false&api_key=apikey the avg response time is 4 seconds so is a good point to start measuring
Thanks for the update @ethervoid! Keep ๐ต๏ธ and you will find something! ๐ช
I've been doing benchmarking and here are my insights by now.
TL; DR; Looks like either the connection to the database or the size of the database is damaging the performance. We have to take into account the the dbd-team pg data disk is 2.7TB
I've added benchmarking in every visualization object to check where the time goes:
"timeviz": [
{
"label": "to_poro",
"real": 0.0069547109997074585,
"cstime": 0,
"cutime": 0,
"stime": 0,
"utime": 0,
"total": 0
},
{
"label": "load_related",
"real": 0.000002701999619603157,
"cstime": 0,
"cutime": 0,
"stime": 0,
"utime": 0,
"total": 0
},
{
"label": "load_dependent",
"real": 0.0000027089990908280015,
"cstime": 0,
"cutime": 0,
"stime": 0,
"utime": 0,
"total": 0
},
{
"label": "liked",
"real": 0.001709437001409242,
"cstime": 0,
"cutime": 0,
"stime": 0,
"utime": 0.009999999999999787,
"total": 0.009999999999999787
},
{
"label": "permission",
"real": 3.5094704250004725,
"cstime": 0,
"cutime": 0,
"stime": 0.15999999999999992,
"utime": 2.4799999999999995,
"total": 2.6399999999999997
},
{
"label": "stats",
"real": 0.003290009999545873,
"cstime": 0,
"cutime": 0,
"stime": 0,
"utime": 0,
"total": 0
},
{
"label": "user_table",
"real": 0.00007167700096033514,
"cstime": 0,
"cutime": 0,
"stime": 0,
"utime": 0,
"total": 0
}
]
As you can see a huge amount, 3.5s, goes to the permission part. Why? because permissions is the first time the user presenter is loaded, without cache, aaaaaand there we have the db_size part
So it looks like we still have the same problem. Measuring the time spent for @andy-esch 's user to retrieve the db size you can see the amount of time spent here (2,96s):
irb(main):002:0> Benchmark.measure { Carto::User.find_by_username('eschbacher').db_size_in_bytes }
[pool] Creating a new connection for 10.60.255.215:cartodb_user_6b304b59-2c86-4d7b-a8a2-dc98a5a35d48_db:6432:postgres:ar (1)
[pool] Found a connection for 10.60.255.215:cartodb_user_6b304b59-2c86-4d7b-a8a2-dc98a5a35d48_db:6432:postgres:ar (1)
=> #<Benchmark::Tms:0x0000000c8f7378 @label="", @real=3.636054564267397, @cstime=0.0, @cutime=0.0, @stime=0.15000000000000002, @utime=2.8100000000000005, @total=2.9600000000000004>
As an interesting part, @javitonino , was taking a look too and he notice that using the console the connection to the db takes ~3s:
Benchmark.measure { db = u.in_database(as: :superuser) } -> 3s
Benchmark.measure { u.db_size_in_bytes } -> 0.25s
Also @javitonino did other test: ActiveRecord::ConnectionAdapters::PostgreSQL::OID::TypeMapInitializer#run <4.620301> this command list all the different types in the DB and took 4.6s
Tomorrow I'll keep investigating :)
Well, we finally have a winner here and a patch for it
ActiveRecord in Rails 4.0 is loading all the types from the database when a connection is made and in our team database is taking ~2.5 seconds because it loads 145k types.
I did a test using two different user models: one from Sequel and the other from AR to do the same operation:
0.000000 0.000000 0.000000 ( 0.235332) --> Sequel
1.920000 0.150000 2.070000 ( 2.689540) --> AR
You can see the problematic method is this one:
ActiveRecord::ConnectionAdapters::PostgreSQL::OID::TypeMapInitializer#run <4.615232>
ActiveRecord::ConnectionAdapters::PostgreSQLAdapter#execute_and_clear <5.136224>
ActiveRecord::ConnectionAdapters::PostgreSQLAdapter#load_additional_types <5.136321>
ActiveRecord::ConnectionAdapters::PostgreSQLAdapter#initialize_type_map <5.136989>
ActiveRecord::ConnectionAdapters::PostgreSQLAdapter#initialize <5.142473>
Class#new <5.142497>
ActiveRecord::ConnectionHandling#postgresql_connection <5.142617>
ActiveRecord::ConnectionAdapters::ConnectionPool#new_connection <5.142634>
ActiveRecord::ConnectionAdapters::ConnectionPool#checkout_new_connection <5.142647>
ActiveRecord::ConnectionAdapters::ConnectionPool#acquire_connection <5.142675>
MonitorMixin#mon_synchronize <5.143279>
ActiveRecord::ConnectionAdapters::ConnectionPool#checkout <5.143303>
MonitorMixin#mon_synchronize <5.143356>
ActiveRecord::ConnectionAdapters::ConnectionPool#connection <5.143382>
Carto::Db::Connection.get_database <5.144334>
CartoDB::ConnectionPool#create_new_connection <5.144612>
CartoDB::ConnectionPool#fetch <5.144657>
Carto::Db::Connection.connect <5.144780>
Carto::UserService#in_database <5.145184>
Carto::UserService#cartodb_extension_version <5.146027>
Carto::UserService#cartodb_extension_version_pre_mu? <5.146076>
Carto::UserService#db_size_in_bytes <5.672939>
Kernel#eval <5.833719>
IRB::WorkSpace#evaluate <5.833742>
IRB::Context#evaluate <5.833836>
IRB::Irb#signal_status <5.833899>
As you can see, the AR model is taking 2.6 seconds to do the same operation than Sequel. Why? Basically, AR when open a connection with a database get all the types from pg_type which in our database is 145k different types.
We can divide the problem into data gathering and processing:
To execute the query and retrieve all the data takes 1.3seconds
1.240000 0.090000 1.330000 ( 1.331408)
Process the data takes 0.5s
0.530000 0.010000 0.540000 ( 0.537235)
We can improve the data processing a bit but we're not going to deal with the real problem: the huge amount of types.
But why are we having that huge amount of types? Taking a look at the table we can see most of the types are related to tables recordset and arrays of those recordset. So we have decided to filter those in this query and in case our app needs it there is no problem because AR would ask for it doing a query
But...are you saying that every time AR asks for a type that we don't have in our cache it's going to do a query? Yes, but those types are not a common thing to ask for. Basically our app is dealing with the metadata database which has a few custom types (207) so we're, probably, going to notice that overhead.
Is not a perfect solution but is going to save us about ~2 seconds when dealing with huge user databases operations
Applying this patch now we're doing it in 0.6s
ActiveRecord::ConnectionAdapters::PostgreSQLAdapter#execute_and_clear(sql=" SELECT t.oid, t.typname, t.typelem, t.typdelim, t.typinput, r.rngsubtype, t.typtype, t.typbasetype\n FROM pg_type as t\n LEFT JOIN pg_range as r ON oid = rngtypid\n LEFT JOIN pg_type as tt ON (tt.typtype = 'c' AND tt.typarray = t.oid AND tt.typinput = 'record_in(cstring,oid,integer)'::regprocedure)\n WHERE\n tt.oid is null\n AND (t.typname IN ('int2', 'int4', 'int8', 'oid', 'float4', 'float8', 'text', 'varchar', 'char', 'name', 'bpchar', 'bool', 'bit', 'varbit', 'timestamptz', 'date', 'time', 'money', 'bytea', 'point', 'hstore', 'json', 'jsonb', 'cidr', 'inet', 'uuid', 'xml', 'tsvector', 'macaddr', 'citext', 'ltree', 'interval', 'path', 'line', 'polygon', 'circle', 'lseg', 'box', 'timestamp', 'numeric')\n OR t.typtype IN ('r', 'e', 'd')\n OR t.typinput = 'array_in(cstring,oid,integer)'::regprocedure\n OR t.typelem != 0)\n", name="SCHEMA", binds=[]) <0.598322>
I've made some tests and when a new type is need it ask for it without problem.
We've re-routed team users to our test platform in production and here are the results using the patch:


As you can see we have improved the time for the dashboard:
In summary, we have passed from ~26s to load the dashboard to ~12.8.
I'd let the patch active and the team users routing through it in order to test for undesired side-effects. If we don't have any issue In a couple of days (2-3), I'll put that patch into production.
Hope this improvement is good enough :)
Now I'm going to start checking the order by size part. Which maybe could be improved in order to avoid making N queries for size data. Stay tunned
Great news @ethervoid! It's a great improvement!
@alonsogarciapablo asked about the performance with the new dashboard. I've changed @andy-esch user to the new dashboard and here you can see the results:


More or less the same. From ~12s to ~6s
About the size sort we have a bit of a problem there:

In accounts with a huge number of tables we get a 504 (timeout) after 180s
The ordering issue is going to continue here
Deployed this part. I'll continue here
Great @ethervoid ๐๐๐
Most helpful comment
Applying this patch now we're doing it in 0.6s
I've made some tests and when a new type is need it ask for it without problem.