Tidb: SELECT count(*) is too slow

Created on 22 Mar 2018  ·  17Comments  ·  Source: pingcap/tidb

我有一个370万条红包数据的mysql 表和 一个 单机(cpu i7 ,256GB固态硬盘,16G内存,Centos7.3)Docker最新版部署的 TiDb 同样是红包数据表,结构都一样。

测试使用
语句
SELECT count(*) FROM biz_user_red_packet;
表结构
int id,int user_id,int order_form_id,int red_packet_id,date create_time,date use_time,int profit,int status,int delete_flag,int planid

发现 TiDB(花费了33秒)非常缓慢,反而mysql(花费1.45秒) 一下就算出表总数了.能否问下是什么原因?

=======
I have a mysql table with 3.7 million red envelope data and a single machine (CPU i7,256GB solid-state drive, 16gb of ram, Centos7.3). The latest version of the TiDb is also a red envelope data table with the same structure.

Test using

statements

SELECT count (*) FROM biz_user_red_packet;

Table structure

Int id,int user_id,int order_form_id,int red_packet_id,date create_time,date use_time,int profit,int status,int delete_flag,int planid.

It is very slow to find TiDB (it takes 33 seconds), but mysql (1.45 seconds) is the total number of tables. Can you ask why?

typquestion

Most helpful comment

Is SELECT count(*) is slow? Well, it depends.

  1. If you just start a TiDB instance on the local machine, it's slower than MySQL.
    TiDB on the local storage is used for testing purpose, it's not optimized for performance.
  2. If you run a TiKV cluster and TiDB on just one physical machine, it's slower than MySQL.
    Communication costs in multiple processes is much higher than a single MySQL process.
  3. If you deploy a real TiKV cluster and run TiDB on it, but the table size is small, TiDB may still slower than MySQL.
  4. You have very large table, you have huge amount of data to store, nearly beyond the capacity of that MySQL can hold, then give TiDB a try. count(*) would be much faster than MySQL.

Is that clear? @zhuxiujia

All 17 comments

@zhuxiujia You can refer the docs at here or here

Doc文档里 建议使用 SELECT count(1) FROM test.biz_user_red_packet;
代替 SELECT count(*) FROM test.biz_user_red_packet;
但是结果是一样缓慢的。

A SELECT count(1) FROM test.biz_user_red_packet is recommended in the Doc document.

Instead of SELECT count(*) FROM test.biz_user_red_packet;

But the result is as slow as the last one.

@zhuxiujia Yes, the count(1) will be as slow as count(*) because they are essentially the same, so you can use methods recommended in the docs to speed up your query.

What's the configuration of your TiDB cluster? How many TiDB, TiKV, PD in it?

And please provide us your TiDB github version.

@XuHuaiyu Only one machine. @zhuxiujia I think your questions is why TiDB run count(*) slower than MySQL? And you sure you have no pd or tikv?

1个TiDb 没有 Tikv pd
i have One TiDb has no Tikv or Pd.

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2ea42358c2fb pingcap/tidb:latest "/tidb-server" 23 hours ago Up 23 hours 0.0.0.0:4000->4000/tcp, 0.0.0.0:10080->10080/tcp tidb-server
62a2fcb6242f tomcat8.5 "catalina.sh run" 8 weeks ago Up 3 weeks 0.0.0.0:9993->8080/tcp api-doc

One PC was deployed with Docker.

If you deploy TiDB upon tikv, TiDB retrieve data by GRPC, which needs network round trip. So this is why TiDB is slower than MySQL in a small table.

  1. Deploy TiDB in a real cluster, use at least 4 machines.
  2. Try to test with larger data.
  3. You can enlarge tidb_distsql_scan_concurrency like this doc said

I only use one TiDB without TiKV and TiPD, no network requests such as your GRPC. It's several times slower than mysql without a network request.
我只用了一个 TiDB 没有 TiKV 和 TiPD,没有网络请求例如你说的GRPC。在没有网络请求耗时的情况下,比mysql慢了好几倍

If you only start a tidb-server, there will be a mock-tikv to store data.
Mock-tikv is used for testing.
You may try to deploy a cluster with at lease 1TiDB, 1TiKV, 1*PD.

@XuHuaiyu I used Docker Compose to deploy the cluster on my machine, with three TiDB, three Tikv, three TiPd, but again, count() is 30 times slower than mysql.
我使用 Docker Compose 部署了集群在我那个机器上,包含3个TiDB,3个Tikv,3个TiPd,但是 还是一样,count(
)比mysql慢30倍

TiDB accesses TiKV in every query. When the amount of data is small, all MySQL is in memory, and TiDB needs to conduct a network access.
So you can try this Chinese doc or this English docas what @winkyao mentioned before.

Is SELECT count(*) is slow? Well, it depends.

  1. If you just start a TiDB instance on the local machine, it's slower than MySQL.
    TiDB on the local storage is used for testing purpose, it's not optimized for performance.
  2. If you run a TiKV cluster and TiDB on just one physical machine, it's slower than MySQL.
    Communication costs in multiple processes is much higher than a single MySQL process.
  3. If you deploy a real TiKV cluster and run TiDB on it, but the table size is small, TiDB may still slower than MySQL.
  4. You have very large table, you have huge amount of data to store, nearly beyond the capacity of that MySQL can hold, then give TiDB a try. count(*) would be much faster than MySQL.

Is that clear? @zhuxiujia

请问这个问题解决了么?我这边4000W到数据count(1)需要7,8秒。这个算是慢么?还是正常的?我个人认为比较慢。

@liujunhui Yes, it's slower than expected, can you provide the deployment of your cluster?

@liujunhui Yes, it's slower than expected, can you provide the deployment of your cluster?

三台PC电脑node1(8G内存)部署一个KV,一个PD,一个TIBD,node2(4G内存)部署一个KV,一个PD,node3(4G内存)部署一个KV,一个PD。node2和node3的内存在查询count(1)的时候会上升90%多,而且查询之后就不会降下来。查询慢的原因是内存不够么?还是其他原因?

@liujunhui I think the hardware is the bottleneck of select count(1) from t too slow. The memory is small and cannot make the TiKV cache data in memory.

Was this page helpful?
0 / 5 - 0 ratings