Tidb: Performance comparison between TiDB and Vitess

Created on 25 Jul 2018 · 22Comments · Source: pingcap/tidb

Hi,

Recently we migrated our https://thulya.com production database from MSSQL to TIDb to experiment. TiDb is definitely the best amongst several SQL and NoSQL systems we have tried/used. The installation using Ansible is so easy and seamless. We have setup the cluster with total of 8 nodes and 4 in each DC (multi DCs) and 2 replicas in each DC. Its working brilliantly for the write queries and with less than 100ms for simple select queries. However for some 'optimised' join queries its taking between 800ms to 8secs (obviously not because of cross DC but within the DC itself). We are thinking to move the Join queries to Elasticsearch or CrateDb to improve the performance.

I have come across Vitess recently. It seems being used by Google and looked at some presentations which talked that even complex joins perform efficiently (with 2ms average and in most cases less than 100ms for complex joins?)

I am wondering the architectural differences between TiDb and Vitess. Have you done any performance comparison between TiDb and Vitess?

Thanks and regards,
Diwakar

typquestion

Source

thulyacloud

Most helpful comment

Hi, @diwakarpp I'm the architect of TiDB, and I think I'm quite familiar with Vitess too.
from the architecture we can see, Vitess is a MySQL sharding solution, like a proxy on top of MySQL, because of that, it's really hard for vitess to handle cross-shard join or subquery (that means you need to build a full-featured SQL optimizer to do that.), it's OK for some complex queries that happen in a single shard and they perform efficiently. Here's a unsupported cases list of Vitess, https://github.com/vitessio/vitess/blob/master/data/test/vtgate/unsupported_cases.txt

c4pt0r on 26 Jul 2018

👍2

All 22 comments

Because of the range-sharding, many joins/inserts/updates/deletes become cross-node, which is slow/inneficient. I'm assuming all the fast stuff that Vitess does are when querying/joining data that resides on the same node (so partitioned by same key). This works the same with CitusDB for example.

Assuming TiDB will introduce hash partitioning, it should be able to do the same. Currently, range-sharding offers better scalability (all rows from client_id can grow infinitely compared to a single node when hashing by client_id).

ddorian on 26 Jul 2018

@diwakarpp Thanks for your feedback!
We do not have any performance comparison with Vitess. You can find some benchmark result here including sysbench/tpc-h.

As you said, some join queries are slow. Would you like to show us your schemas and queries? Maybe we could give you some advice.

shenli on 26 Jul 2018

c4pt0r on 26 Jul 2018

👍2

Hi @ddorian , @shenli and @c4pt0r thanks a lot for the detailed inputs. I think Vitess has many pitfalls in their approach and architecture. We have done good decision by migrating our db to TiDb rather than Vitess. TiDb seems to be quite stable and resilient and we will fine tune it to handle the complex joins (e.g: conditional match and inner join between 8 tables or 10 tables in some instances. May we will split the queries into more than one to handle the complexity).

I am not sure if any of the below have been configured by us incorrectly. Could you pl share your thoughts to confirm?

(1) One of our DC is London based and second DC is Kansas US based. We are thinking to move US based DC to France so the DCs will remain close to each other to avoid network latency during replication. Currently each DC has two replicas. Would the join queries are impacted because of the distance between DCs? Although technically the copies of data is fully self contained within the DC, would the join queries span to multiple DCs during execution which could cause delay?

(2) In our cluster config, we have ignored the Rack field for the pd servers installation and used just the Zone and Host fields. Is this fine in general? Would it impact the performance in any way?

[pd_servers:vars]
location_labels = ["zone","host"]

(3) We will share some complex join queries shortly just to see if we made any errors or to reduce the complexity or add indexes as per your suggestion.

thulyacloud on 26 Jul 2018

(4) Currently we did not enable big logs.
4.1 If we enable bin logs, will it impact the performance of the cluster or queries?
4.2. Is there any query that we can run from the mysql client to fetch the binlogs from the cluster or should we read it from each node?
4.3. Also, how long the bin logs will be present for our data sync with Elasticsearch ?
4.4. During our data sync, how to maintain a pointer for the protobuf data or clearing the bin log data to avoid redundant syncing?

thulyacloud on 26 Jul 2018

Hi, I would appreciate your reply for the above aspects. Do you want me to create separate thread for the above aspects? Thanks and regards.

thulyacloud on 27 Jul 2018

I know you guys might be very busy or not available currently. Its a concern that there are no replies even after 2 business days since my above questions. This is a concern for us to use TiDb in production. Whereas Vitess team are responding to our queries quicker and in most cases instantly.

thulyacloud on 27 Jul 2018

Sorry for the late response, I was on a business trip yesterday. I'm preparing the answers now, will post soon.

c4pt0r on 28 Jul 2018

@diwakarpp

(1) One of our DC is London based and second DC is Kansas US based. We are thinking to move US based DC to France so the DCs will remain close to each other to avoid network latency during replication. Currently each DC has two replicas. Would the join queries are impacted because of the distance between DCs? Although technically the copies of data is fully self contained within the DC, would the join queries span to multiple DCs during execution which could cause delay?

TiDB splits data in multiple regions(96MB), every region has 3 replicas by default, PD will place the replcas in different machines/racks/DCs, but only leader replica will handle the traffic, we can specify all the region leaders in one datacenter, so JOINs will not fetch data from different DCs.

(2) In our cluster config, we have ignored the Rack field for the pd servers installation and used just the Zone and Host fields. Is this fine in general? Would it impact the performance in any way?

It matters, but it only affects the scheduling strategy and HA, if you have multiple DCs (at least 3), PD will place replicas across different DCs, but if you don't have enough DC, it means there's a DC will contain more than 1 replicas, so if you ignore the Rack field, there's a chance that PD will place 2 replicas in the same Rack, if all the physical machines in this Rack are out of service (for example, power off), that means some data will lost the majority of replicas.

(3) We will share some complex join queries shortly just to see if we made any errors or to reduce the complexity or add indexes as per your suggestion.

You can send the queries to me via email:[email protected], we can review it for you.

(4) Currently we did not enable big logs.
4.1 If we enable bin logs, will it impact the performance of the cluster or queries?

No, it will not impact the perforamance, it's asynchornos process.

4.2. Is there any query that we can run from the mysql client to fetch the binlogs from the cluster or should we read it from each node?

We have a offical tool called, tidb-binlog, it's a seperate component, it will automaticlly fetch binlog from each tidb node, and sort them with transaction order. But it's not an open-source yet, it's in our Enterprise Edition, but if you guys have insterest in tidb-binlog, we're happy to offer.

4.3. Also, how long the bin logs will be present for our data sync with Elasticsearch ?

Most cases, in seconds.

4.4. During our data sync, how to maintain a pointer for the protobuf data or clearing the bin log data to avoid redundant syncing?

Inside tidb-binlog, we have a kafka for caching the binlogs. every binlog has a transaction id, and the transaction id is monotone increasing (it's guarantee by our transaction model), as I mentioned, tidb-binlog will sort them in transaction id, and maintain the pointer.

c4pt0r on 28 Jul 2018

Hi @c4pt0r , thank you very much for your detailed reply. Its helpful and much appreciated.

Could you pl confirm the below so we will proceed to configure and test them?

(1) How to configure Ansible or TiDb yaml to specify all the region leaders within the same datacenter (for the joins to work within the same DC)? Could you pl give any write up/reference for this one.

We have two DCs. In our Ansible inventory.ini config, we have the following:

[pd_servers:vars]
location_labels = ["zone","host"]
max-replicas = 2

[pd_servers]
Here we have 4 server IPs specified as PD servers.

Can you pl give an example inventory.ini entries to specify the region leaders for each datacenter(DC) separately?

(4.2) I am interested to use tidb-binlog for syncing the data from TiDb to Elasticsearch/Crate. Could you pl send me the access/download link to try it out. Please enable the license for us to use it for 1 to 2 years. Once we start generating revenue, we will be happy to go for the paid/enterprise license too. By the way, please message me privately with the enterprise license cost for the tidb-binlog.

thulyacloud on 28 Jul 2018

@diwakarpp
For question 1:
You can use pd-ctl to add scheduling strategy to make all the leaders in the same IDC. Please refer to the document. In this case, I think you can use scheduler add evict-leader-scheduler storeID to evict leaders from the other IDC.
BTW: I recommend you to set max-replicas to 3 instead of 2.

For question 2:
Please refer to the tidb-binlog manual. Although tidb-binlog is not open-source, you can use it free.

shenli on 28 Jul 2018

@diwakarpp Happy to hear that about your interest in enterprise edition! Please feel free to contact me via email, [email protected] :)

c4pt0r on 29 Jul 2018

thank you @c4pt0r and @shenli . I do not know how to use pd-ctl, I will refer your docs and apply the scheduling strategy to restrict the queries and join queries to within the DC. At the time of asking the original question we were in a rush for the production deadline. Now we have rescheduled it for August end to do the stability checks. We will apply the pd-ctl and let you know if any questions or help needed. @c4pt0r I will ping you privately on the enterprise edition :+1: Cheers.

thulyacloud on 10 Aug 2018

@diwakarpp cool, looking forward to it, feel free to ping us if you have any question

c4pt0r on 10 Aug 2018

This issue is resolved now. I am closing this post. Thanks.

thulyacloud on 20 Aug 2018

@c4pt0r You guys have any roadmap with hash-sharding and putting data with same shard_id across different tables in the server ? This way you get the benefits of both range & hash sharding (unlimited scale on range sharding, better joins/transactions on hash sharding when using same hash_value).

ddorian on 23 Aug 2018

👍1

As @ddorian mentioned, combining range sharding and hash sharding both would be a smart combination to speed up the join queries. I'm not sure about the cons though. @c4pt0r @shenli any thoughts/plans for this?

thulyacloud on 23 Aug 2018

@diwakarpp @ddorian Thanks for your advice.
We don't have a plan for it. In TiDB, data is managed in the unit of region. There is no guarantee that data with the same hash value will be settled in the same region. I think the hierarchical schema used in Google F1 is a better way.

shenli on 23 Aug 2018

@shenli the problem with hierarchical is that:

the data is in-line so different tables can't be easily cached (since child tables are often 100x larger)
bigger primary-key on child tables (though can be cached)
only possible to query by parent_id on primary-key (while with hash-partition, you can hash by parent_id, and primary-key=(id))
even indexing child tables will still have bigger primary-key
unable to have local-indexes for child or parent tables

ddorian on 24 Aug 2018

@ddorian With Hierarchical schema we could choose streaming join operator. It is faster and more memory efficient.

shenli on 25 Aug 2018

thank you @shenli Just for discussion on the join query performance, looks like cratedb guys used Block Hash Joins and they explained it in a recent article here. I guess it could be a overkill for TiDB performance fine tuning as TiDB and CrateDB both are completely different systems in the core engine level but just thought to share it for your team's reference and analysis if required. Cheers.

thulyacloud on 28 Aug 2018

👍1

@shenli The final article Part 3 about performance improvement using Hash joins seems to have been posted by Crate guys. Interestingly (1) The locality approach (2) The modulo distribution approach both seem to have been discussed. This is just for a reference! Thanks and regards.

thulyacloud on 6 Sep 2018

🎉1

Was this page helpful?

0 / 5 - 0 ratings