Cartodb: [Dashboard] Performance Review

Created on 16 Jan 2019  ยท  20Comments  ยท  Source: CartoDB/cartodb

We have to make sure the New Dashboard performs ๐Ÿ‘Œ in accounts with loads of maps and datasets. @andy-esch recently explained how the performance of the actual Dashboard is pretty bad in his account (~1000 datasets).

The goal of this issue is to assess what the performance of the New Dashboard is in accounts like @andy-esch 's and do a little investigation of what's causing the UI to be slow, so that we can decide how to fix some or all of them in the near future.

A more performant Dashboard will be add a lot of value to heavy CARTO users ๐Ÿ’ช

cc: @javitonino @ethervoid

Backend

Most helpful comment

Applying this patch now we're doing it in 0.6s

ActiveRecord::ConnectionAdapters::PostgreSQLAdapter#execute_and_clear(sql="              SELECT t.oid, t.typname, t.typelem, t.typdelim, t.typinput, r.rngsubtype, t.typtype, t.typbasetype\n              FROM pg_type as t\n              LEFT JOIN pg_range as r ON oid = rngtypid\n              LEFT JOIN pg_type as tt ON (tt.typtype = 'c' AND tt.typarray = t.oid AND tt.typinput = 'record_in(cstring,oid,integer)'::regprocedure)\n              WHERE\n                tt.oid is null\n                AND (t.typname IN ('int2', 'int4', 'int8', 'oid', 'float4', 'float8', 'text', 'varchar', 'char', 'name', 'bpchar', 'bool', 'bit', 'varbit', 'timestamptz', 'date', 'time', 'money', 'bytea', 'point', 'hstore', 'json', 'jsonb', 'cidr', 'inet', 'uuid', 'xml', 'tsvector', 'macaddr', 'citext', 'ltree', 'interval', 'path', 'line', 'polygon', 'circle', 'lseg', 'box', 'timestamp', 'numeric')\n                OR t.typtype IN ('r', 'e', 'd')\n                OR t.typinput = 'array_in(cstring,oid,integer)'::regprocedure\n                OR t.typelem != 0)\n", name="SCHEMA", binds=[]) <0.598322>

I've made some tests and when a new type is need it ask for it without problem.

All 20 comments

Randomly came across this which should help: https://github.com/CartoDB/cartodb/pull/14621

Some more info that might be relevant: https://github.com/CartoDB/cartodb/issues/13953

@jesusbotella sent me this issue about the (poor) performance of the datasets view:
https://groups.google.com/forum/#!msg/cartodb/K7dG4p0Tpyw/NP5ncVshCQAJ
Similar to what @andy-esch was expressing. I see @javitonino already replied. If we can fix this in the New Dashboard, it will be a super-win for our users ๐Ÿ’ช

@ethervoid do you have any insights regarding this so far? Thanks!

@alonsogarciapablo I'm reading all the issues, understanding the measures taken an so on. I finished yesterday the Likes cleaning so I'm on it :)

Perfect! Thanks for the update! ๐Ÿ’ช

Update

I've been uploading and trying to reproduce the problem in my local environment but is not the same and I don't have the same latency problems that in production. Why?

  • The metadata and the user database are in the same machine
  • We don't use pgbouncer
  • We don't have hundreds/thousands of users in the same database

Then I tested it in staging cloud with similar results. It could be due to similar factors than the local environment.

Next step is to provision a new RUI, redirect @andy-esch user to that rui and start doing profiling and so on.

For example doing this request: https://team.carto.com/u/eschbacher/api/v1/viz?tag_name=&q=&page=1&type=derived&exclude_shared=false&per_page=1&locked=true&shared=no&only_liked=false&api_key=apikey the avg response time is 4 seconds so is a good point to start measuring

Thanks for the update @ethervoid! Keep ๐Ÿ•ต๏ธ and you will find something! ๐Ÿ’ช

Update

I've been doing benchmarking and here are my insights by now.

TL; DR; Looks like either the connection to the database or the size of the database is damaging the performance. We have to take into account the the dbd-team pg data disk is 2.7TB

I've added benchmarking in every visualization object to check where the time goes:

"timeviz": [
                {
                    "label": "to_poro",
                    "real": 0.0069547109997074585,
                    "cstime": 0,
                    "cutime": 0,
                    "stime": 0,
                    "utime": 0,
                    "total": 0
                },
                {
                    "label": "load_related",
                    "real": 0.000002701999619603157,
                    "cstime": 0,
                    "cutime": 0,
                    "stime": 0,
                    "utime": 0,
                    "total": 0
                },
                {
                    "label": "load_dependent",
                    "real": 0.0000027089990908280015,
                    "cstime": 0,
                    "cutime": 0,
                    "stime": 0,
                    "utime": 0,
                    "total": 0
                },
                {
                    "label": "liked",
                    "real": 0.001709437001409242,
                    "cstime": 0,
                    "cutime": 0,
                    "stime": 0,
                    "utime": 0.009999999999999787,
                    "total": 0.009999999999999787
                },
                {
                    "label": "permission",
                    "real": 3.5094704250004725,
                    "cstime": 0,
                    "cutime": 0,
                    "stime": 0.15999999999999992,
                    "utime": 2.4799999999999995,
                    "total": 2.6399999999999997
                },
                {
                    "label": "stats",
                    "real": 0.003290009999545873,
                    "cstime": 0,
                    "cutime": 0,
                    "stime": 0,
                    "utime": 0,
                    "total": 0
                },
                {
                    "label": "user_table",
                    "real": 0.00007167700096033514,
                    "cstime": 0,
                    "cutime": 0,
                    "stime": 0,
                    "utime": 0,
                    "total": 0
                }
            ]

As you can see a huge amount, 3.5s, goes to the permission part. Why? because permissions is the first time the user presenter is loaded, without cache, aaaaaand there we have the db_size part

So it looks like we still have the same problem. Measuring the time spent for @andy-esch 's user to retrieve the db size you can see the amount of time spent here (2,96s):

irb(main):002:0> Benchmark.measure { Carto::User.find_by_username('eschbacher').db_size_in_bytes }
[pool] Creating a new connection for 10.60.255.215:cartodb_user_6b304b59-2c86-4d7b-a8a2-dc98a5a35d48_db:6432:postgres:ar (1)
[pool] Found a connection for 10.60.255.215:cartodb_user_6b304b59-2c86-4d7b-a8a2-dc98a5a35d48_db:6432:postgres:ar (1)
=> #<Benchmark::Tms:0x0000000c8f7378 @label="", @real=3.636054564267397, @cstime=0.0, @cutime=0.0, @stime=0.15000000000000002, @utime=2.8100000000000005, @total=2.9600000000000004>

As an interesting part, @javitonino , was taking a look too and he notice that using the console the connection to the db takes ~3s:

Benchmark.measure { db = u.in_database(as: :superuser) } -> 3s
Benchmark.measure { u.db_size_in_bytes } -> 0.25s

Also @javitonino did other test: ActiveRecord::ConnectionAdapters::PostgreSQL::OID::TypeMapInitializer#run <4.620301> this command list all the different types in the DB and took 4.6s

Tomorrow I'll keep investigating :)

Update

Well, we finally have a winner here and a patch for it

ActiveRecord in Rails 4.0 is loading all the types from the database when a connection is made and in our team database is taking ~2.5 seconds because it loads 145k types.

I did a test using two different user models: one from Sequel and the other from AR to do the same operation:

  0.000000   0.000000   0.000000 (  0.235332) --> Sequel
  1.920000   0.150000   2.070000 (  2.689540) --> AR

You can see the problematic method is this one:

ActiveRecord::ConnectionAdapters::PostgreSQL::OID::TypeMapInitializer#run <4.615232>
                                                ActiveRecord::ConnectionAdapters::PostgreSQLAdapter#execute_and_clear <5.136224>
                                              ActiveRecord::ConnectionAdapters::PostgreSQLAdapter#load_additional_types <5.136321>
                                            ActiveRecord::ConnectionAdapters::PostgreSQLAdapter#initialize_type_map <5.136989>
                                          ActiveRecord::ConnectionAdapters::PostgreSQLAdapter#initialize <5.142473>
                                        Class#new <5.142497>
                                      ActiveRecord::ConnectionHandling#postgresql_connection <5.142617>
                                    ActiveRecord::ConnectionAdapters::ConnectionPool#new_connection <5.142634>
                                  ActiveRecord::ConnectionAdapters::ConnectionPool#checkout_new_connection <5.142647>
                                ActiveRecord::ConnectionAdapters::ConnectionPool#acquire_connection <5.142675>
                              MonitorMixin#mon_synchronize <5.143279>
                            ActiveRecord::ConnectionAdapters::ConnectionPool#checkout <5.143303>
                          MonitorMixin#mon_synchronize <5.143356>
                        ActiveRecord::ConnectionAdapters::ConnectionPool#connection <5.143382>
                      Carto::Db::Connection.get_database <5.144334>
                    CartoDB::ConnectionPool#create_new_connection <5.144612>
                  CartoDB::ConnectionPool#fetch <5.144657>
                Carto::Db::Connection.connect <5.144780>
              Carto::UserService#in_database <5.145184>
            Carto::UserService#cartodb_extension_version <5.146027>
          Carto::UserService#cartodb_extension_version_pre_mu? <5.146076>
        Carto::UserService#db_size_in_bytes <5.672939>
      Kernel#eval <5.833719>
    IRB::WorkSpace#evaluate <5.833742>
  IRB::Context#evaluate <5.833836>
IRB::Irb#signal_status <5.833899>

As you can see, the AR model is taking 2.6 seconds to do the same operation than Sequel. Why? Basically, AR when open a connection with a database get all the types from pg_type which in our database is 145k different types.

We can divide the problem into data gathering and processing:

To execute the query and retrieve all the data takes 1.3seconds

  1.240000   0.090000   1.330000 (  1.331408)

Process the data takes 0.5s

  0.530000   0.010000   0.540000 (  0.537235)

We can improve the data processing a bit but we're not going to deal with the real problem: the huge amount of types.

But why are we having that huge amount of types? Taking a look at the table we can see most of the types are related to tables recordset and arrays of those recordset. So we have decided to filter those in this query and in case our app needs it there is no problem because AR would ask for it doing a query

But...are you saying that every time AR asks for a type that we don't have in our cache it's going to do a query? Yes, but those types are not a common thing to ask for. Basically our app is dealing with the metadata database which has a few custom types (207) so we're, probably, going to notice that overhead.

Is not a perfect solution but is going to save us about ~2 seconds when dealing with huge user databases operations

Applying this patch now we're doing it in 0.6s

ActiveRecord::ConnectionAdapters::PostgreSQLAdapter#execute_and_clear(sql="              SELECT t.oid, t.typname, t.typelem, t.typdelim, t.typinput, r.rngsubtype, t.typtype, t.typbasetype\n              FROM pg_type as t\n              LEFT JOIN pg_range as r ON oid = rngtypid\n              LEFT JOIN pg_type as tt ON (tt.typtype = 'c' AND tt.typarray = t.oid AND tt.typinput = 'record_in(cstring,oid,integer)'::regprocedure)\n              WHERE\n                tt.oid is null\n                AND (t.typname IN ('int2', 'int4', 'int8', 'oid', 'float4', 'float8', 'text', 'varchar', 'char', 'name', 'bpchar', 'bool', 'bit', 'varbit', 'timestamptz', 'date', 'time', 'money', 'bytea', 'point', 'hstore', 'json', 'jsonb', 'cidr', 'inet', 'uuid', 'xml', 'tsvector', 'macaddr', 'citext', 'ltree', 'interval', 'path', 'line', 'polygon', 'circle', 'lseg', 'box', 'timestamp', 'numeric')\n                OR t.typtype IN ('r', 'e', 'd')\n                OR t.typinput = 'array_in(cstring,oid,integer)'::regprocedure\n                OR t.typelem != 0)\n", name="SCHEMA", binds=[]) <0.598322>

I've made some tests and when a new type is need it ask for it without problem.

Update

We've re-routed team users to our test platform in production and here are the results using the patch:

Without the patch activated

screenshot from 2019-02-04 11-28-52

With the patch activated

screenshot from 2019-02-04 11-39-58

As you can see we have improved the time for the dashboard:

  • /me endpoint have improved from 4.3s to 1.4s
  • /viz endpoints have improved from: about ~5s to ~3s

In summary, we have passed from ~26s to load the dashboard to ~12.8.

I'd let the patch active and the team users routing through it in order to test for undesired side-effects. If we don't have any issue In a couple of days (2-3), I'll put that patch into production.

Hope this improvement is good enough :)

Now I'm going to start checking the order by size part. Which maybe could be improved in order to avoid making N queries for size data. Stay tunned

Great news @ethervoid! It's a great improvement!

@alonsogarciapablo asked about the performance with the new dashboard. I've changed @andy-esch user to the new dashboard and here you can see the results:

Without the new patch

screenshot from 2019-02-04 14-17-30

With the new patch

screenshot from 2019-02-04 14-15-33

More or less the same. From ~12s to ~6s

Update

About the size sort we have a bit of a problem there:

screenshot from 2019-02-04 14-24-57

In accounts with a huge number of tables we get a 504 (timeout) after 180s

The ordering issue is going to continue here

Deployed this part. I'll continue here

Great @ethervoid ๐Ÿ‘๐Ÿ‘๐Ÿ‘

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ramiroaznar picture ramiroaznar  ยท  5Comments

javitonino picture javitonino  ยท  5Comments

arianaescobar picture arianaescobar  ยท  5Comments

arianaescobar picture arianaescobar  ยท  4Comments

xavijam picture xavijam  ยท  3Comments