Proxysql CPU usage greatly increase in 24 hours with only 2 connections active.
Server is RHEL 6.8 running 2.6.32-642.4.2.el6.x86_64 kernel running proxysql-1.3.0h-1.x86_64
top - 15:42:22 up 68 days, 13 min, 1 user, load average: 1.44, 1.38, 1.36 Tasks: 111 total, 1 running, 110 sleeping, 0 stopped, 0 zombie Cpu(s): 24.3%us, 75.6%sy, 0.0%ni, 0.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1914216k total, 1223260k used, 690956k free, 173528k buffers Swap: 2093052k total, 11216k used, 2081836k free, 711984k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3554 root 20 0 222m 31m 2632 S 199.5 1.7 391:54.63 proxysql 1520 root 20 0 930m 6424 2980 S 0.3 0.3 49:55.22 dad
root 3553 1 0 Dec06 ? 00:00:00 proxysql -c /etc/proxysql.cnf -D /var/lib/proxysql root 3554 3553 26 Dec06 ? 06:41:30 proxysql -c /etc/proxysql.cnf -D /var/lib/proxysql
Hi Sinan,
Thank you for the report, seems like a bug: it is spinning somewhere.
Can you please:
/var/lib/proxysql/proxysql.logsudo pt-pmp --pid 3554 --iterations 30
sudo perf record -p 3554
Once perf record is completed, please run the follow and attach report.txt
sudo perf report > report.txt
Thanks
I haven't notice any errors in proxysql.log, only some users activity logged.
Report file is attached.
Thanks.
Could you please attach also the output of pt-pmp ?
From the output of perf it seems it is spinning on poll().
Is the two connections in use stop sending traffic, does the CPU usage go to zero?
Sorry, I have already restarted the application.
Now pt-pmp is returning
Unhandled dwarf expression opcode 0xf3
rapidly
I'll check to drop connections when this occurs agains
I'm seeing similar with the latest 1.3.1 release. Attached is the result from a run under low load. I'll rerun this later under our full query load where we see two things happening:
CPU usage seems to creep up over time, starting at around 200-300% reported by top, to 700% or more hours later under the same query load
Once CPU usage is up we also notice a large increase in memory use by the proxysql process, which stays constantly low for many hours after a fresh start, until something happens and it balloons
Some stats: in full query load we have 20-40k qps, generating ~700MBit/s+ of db traffic. We recently switched to compressing mysql traffic to keep things in check until we optimize our db usage to reduce the amount of data we push into mysql and read out of it is a good test bed for proxysql performance.
Attached is a report you can see in the mean time but we'll rerun it later when more load is introduced
I met same problem with 1.3.1 release. The proxysql running about 2 hours. I found CPU usage very high.

proxysql.log include some errors message:
2016-12-29 10:30:01 sqlite3db.cpp:56:execute(): [ERROR] SQLITE error: cannot start a transaction within a transaction --- BEGIN
2016-12-29 10:30:08 sqlite3db.cpp:56:execute(): [ERROR] SQLITE error: cannot commit - no transaction is active --- COMMIT
2016-12-29 10:30:08 sqlite3db.cpp:56:execute(): [ERROR] SQLITE error: cannot start a transaction within a transaction --- BEGIN
2016-12-29 10:30:17 sqlite3db.cpp:56:execute(): [ERROR] SQLITE error: cannot commit - no transaction is active --- COMMIT
2016-12-29 10:39:01 sqlite3db.cpp:56:execute(): [ERROR] SQLITE error: cannot start a transaction within a transaction --- BEGIN
2016-12-29 10:39:02 sqlite3db.cpp:56:execute(): [ERROR] SQLITE error: cannot commit - no transaction is active --- COMMIT
Attached perf command report file.
This CPU usage seems a spike caused by some spinloop.
@taylor840326 : are you graphing also ProxySQL's metrics?
Would be interesting to find a correlation between that spike and other metrics.
Thanks
I use telegraf+grafana monitoring CPU load
@renecannao version 1.2.6 is ok.
@taylor840326 : which version is not ok?
When is not ok, do you have any profiling from perf or pt-pmp ?
@renecannao I have tested v1.2.6/v1.3.1/v1.3.2 version, all the version have same problem.
I'm not sure when the problem happended. Maybe 2 hours or 6 hours or 12 hours.
attached the perf report file.
perf_1.2.6.txt
@taylor840326 : this perf report doesn't _seem_ related to what reported in #870 .
Can you please run pt-pmp against proxysql?
Also please share your configuration (from mysql_servers , global_variables and mysql_query_rules)?
@renecannao ok, I will use pt-pmp comamnd next time.
proxysql-1.3.3 rpm running on RHEL 6.8 has the same issues
My Problem occurs mainly when there is a lag between the proxsql and mysql serves. ( Our case Proxysql is in branch office, and mysql is at production site)
# Samples: 3 of event 'cpu-clock'
# Event count (approx.): 750000
#
# Overhead Command Shared Object Symbol
# ........ ........ ................. ...........
#
100.00% proxysql [kernel.kallsyms] [k] do_wait
@ayder , can you please run pt-pmp against proxysql?
When running ps you will notice two proxysql processes, one is the parent and the other is the child.
Assuming 12345 is the pid of the child process, you need to run something like:
pt-pmp --iterations 30 -p 12345
I really want to close this issue, but I need to understand where it is using a lot of CPU.
Thank you for your help.
The output of pt-pmp command against the child process of pt-pmp is
Unhandled dwarf expression opcode 0xf3
240 pthread_cond_wait,remove,ConsumerThread::run,start_thread(libpthread.so.0),clone(libc.so.6)
120 epoll_wait(libc.so.6),MySQL_Thread::run,mysql_worker_thread_func_idles,start_thread(libpthread.so.0),clone(libc.so.6)
118 poll(libc.so.6),MySQL_Thread::run,mysql_worker_thread_func,start_thread(libpthread.so.0),clone(libc.so.6)
30 poll(libc.so.6),admin_main_loop,start_thread(libpthread.so.0),clone(libc.so.6)
30 nanosleep(libc.so.6),usleep(libc.so.6),Query_Cache::purgeHash_thread,mysql_shared_query_cache_funct,start_thread(libpthread.so.0),clone(libc.so.6)
30 nanosleep(libc.so.6),usleep(libc.so.6),MySQL_Monitor::run,libstdc++::??(libstdc++.so.6),start_thread(libpthread.so.0),clone(libc.so.6)
30 nanosleep(libc.so.6),usleep(libc.so.6),MySQL_Monitor::monitor_replication_lag,monitor_replication_lag_pthread,start_thread(libpthread.so.0),clone(libc.so.6)
30 nanosleep(libc.so.6),usleep(libc.so.6),MySQL_Monitor::monitor_read_only,monitor_read_only_pthread,start_thread(libpthread.so.0),clone(libc.so.6)
30 nanosleep(libc.so.6),usleep(libc.so.6),MySQL_Monitor::monitor_ping,monitor_ping_pthread,start_thread(libpthread.so.0),clone(libc.so.6)
30 nanosleep(libc.so.6),usleep(libc.so.6),MySQL_Monitor::monitor_connect,monitor_connect_pthread,start_thread(libpthread.so.0),clone(libc.so.6)
30 nanosleep(libc.so.6),usleep(libc.so.6),main
2 ::??,::??,::??
@ayder , thank you very much!
From the output it seems that all the threads are waiting.
The threads in MySQL_Monitor are sleeping;
The Admin thread is waiting on poll();
The 4 main MySQL_Thread are waiting on poll();
The 4 auxiliary MySQL_Thread are waiting on epoll();
The 8 ConsumerThread() are waiting in the queue.
I am not sure what it is causing the CPU spike: from the above everything seems quiet, very idle.
Maybe it is possible that your system isn't able to manage so many threads.
Can you please reconfigure proxysql to use just one thread? Thanks
set mysql-threads=1;
save mysql variables to disk;
proxysql restart;
After setting the
mysql-threads =1
Variable
Cpu usage is now %100 of 1 core of 2 core vm. (That was %200 before utilizing all the cores)
Without restarting proxysql, can you please try again to run pt-pmp , maybe with an increased number of iteration:
pt-pmp --iterations 60 -p 12345
Thank you.
Oh the output this time is:
/usr/bin/proxysql (deleted): No such file or directory.
Cannot access memory at address 0x687fa290
60 ::??
I have not deleted the binary nor changed it.
[root@XXDBAPPSL01 ~]# ls -l /usr/bin/proxysql
-rwxr-xr-x 1 root root 13087538 Jan 21 00:10 /usr/bin/proxysql
@ayder : this probably mean that proxysql was upgraded.
You can verify this comparing the out of these two commands:
/usr/bin/proxysql --version
mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "SELECT @@version" -NB
Well versions consistent, no upgrade has done.
I'll restart the aplicattion and wait again
[salyuruk@XXDBAPPSL01 ~]$ mysql --defaults-file=.proxysql.cfg -p -e "SELECT @@version" -NB
Enter password:
1.3.3-0-gf2b677a
[salyuruk@OSLDBAPPSL01 ~]$ proxysql --version
ProxySQL version 1.3.3-0-gf2b677a, codename Truls
This is getting really odd. Another hypothesis is that there are two proxies running.
Can you please run this:
ps aux | grep proxysql
[root@XXDBAPPSL01 ~]# ps aux | grep proxysql
root 19223 0.0 0.2 51260 5124 ? S Feb10 0:00 proxysql -c /etc/proxysql.cnf -D /var/lib/proxysql
root 19224 16.9 1.6 172348 31804 ? Sl Feb10 1217:10 proxysql -c /etc/proxysql.cnf -D /var/lib/proxysql
root 60320 0.0 0.0 103324 848 pts/1 S+ 15:18 0:00 grep proxysql
Nope
Honestly, I am out of ideas for now.
The previous output of pt-pmp doesn't show anything odd, and now /usr/bin/proxysql (deleted): No such file or directory is very odd too.
Is there anything unusual reported in dmesg or syslog or proxysql log file?
What about lsof -n -p 19224 ?
[root@XXDBAPPSL01 ~]# lsof -n -p 19224
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
proxysql 19224 root cwd DIR 253,0 4096 139114 /var/lib/proxysql
proxysql 19224 root rtd DIR 253,0 4096 2 /
proxysql 19224 root txt REG 253,0 12982906 390056 /usr/bin/proxysql (deleted)
proxysql 19224 root mem REG 253,0 1969952 395943 /usr/lib64/libcrypto.so.1.0.1e
proxysql 19224 root mem REG 253,0 448344 392447 /usr/lib64/libssl.so.1.0.1e
proxysql 19224 root mem REG 253,0 161704 396570 /lib64/ld-2.12.so
proxysql 19224 root mem REG 253,0 1930416 398992 /lib64/libc-2.12.so
proxysql 19224 root mem REG 253,0 146592 399382 /lib64/libpthread-2.12.so
proxysql 19224 root mem REG 253,0 23088 399347 /lib64/libdl-2.12.so
proxysql 19224 root mem REG 253,0 47760 399391 /lib64/librt-2.12.so
proxysql 19224 root mem REG 253,0 600048 399989 /lib64/libm-2.12.so
proxysql 19224 root mem REG 253,0 91096 392580 /lib64/libz.so.1.2.3
proxysql 19224 root mem REG 253,0 124640 392013 /lib64/libselinux.so.1
proxysql 19224 root mem REG 253,0 114496 399177 /lib64/libresolv-2.12.so
proxysql 19224 root mem REG 253,0 93320 391716 /lib64/libgcc_s-4.4.7-20120601.so.1
proxysql 19224 root mem REG 253,0 989840 400216 /usr/lib64/libstdc++.so.6.0.13
proxysql 19224 root mem REG 253,0 17256 392093 /lib64/libcom_err.so.2.1
proxysql 19224 root mem REG 253,0 12592 406828 /lib64/libkeyutils.so.1.3
proxysql 19224 root mem REG 253,0 177520 392097 /lib64/libk5crypto.so.3.1
proxysql 19224 root mem REG 253,0 46368 392717 /lib64/libkrb5support.so.0.1
proxysql 19224 root mem REG 253,0 280520 392095 /lib64/libgssapi_krb5.so.2.2
proxysql 19224 root mem REG 253,0 948840 392087 /lib64/libkrb5.so.3.3
proxysql 19224 root mem REG 253,0 27896 391297 /lib64/libnss_dns-2.12.so
proxysql 19224 root mem REG 253,0 66432 391299 /lib64/libnss_files-2.12.so
proxysql 19224 root 0r CHR 1,3 0t0 4217 /dev/null
proxysql 19224 root 1w REG 253,0 584020 129989 /var/lib/proxysql/proxysql.log
proxysql 19224 root 2w REG 253,0 584020 129989 /var/lib/proxysql/proxysql.log
proxysql 19224 root 3r FIFO 0,8 0t0 20510499 pipe
proxysql 19224 root 4w FIFO 0,8 0t0 20510499 pipe
proxysql 19224 root 5u REG 253,0 33792 139105 /var/lib/proxysql/proxysql.db
proxysql 19224 root 6u REG 253,0 33792 139105 /var/lib/proxysql/proxysql.db
proxysql 19224 root 7r FIFO 0,8 0t0 20510502 pipe
proxysql 19224 root 8w FIFO 0,8 0t0 20510502 pipe
proxysql 19224 root 9r FIFO 0,8 0t0 20510503 pipe
proxysql 19224 root 10w FIFO 0,8 0t0 20510503 pipe
proxysql 19224 root 11u IPv4 20510505 0t0 TCP *:6033 (LISTEN)
proxysql 19224 root 12u REG 0,9 0 4213 [eventpoll]
proxysql 19224 root 13u unix 0xffff88007d1c4540 0t0 20510506 /tmp/proxysql.sock
proxysql 19224 root 14u IPv4 20510518 0t0 TCP 127.0.0.1:6032 (LISTEN)
proxysql 19224 root 15u unix 0xffff88007d006800 0t0 20510519 /tmp/proxysql_admin.sock
proxysql 19224 root 16u IPv4 21257635 0t0 TCP 10.35.100.197:36666->10.35.78.78:mysql (ESTABLISHED)
proxysql 19224 root 17u IPv4 21274282 0t0 TCP 10.35.100.197:59106->10.35.71.23:mysql (ESTABLISHED)
proxysql 19224 root 18u IPv4 21235705 0t0 TCP 10.35.100.197:6033->10.238.236.15:49559 (ESTABLISHED)
proxysql 19224 root 19u sock 0,6 0t0 21235708 can't identify protocol
proxysql 19224 root 20u IPv4 21269066 0t0 TCP 10.35.100.197:6033->10.238.236.15:56218 (ESTABLISHED)
proxysql 19224 root 21u IPv4 21269069 0t0 TCP 10.35.100.197:37218->10.35.78.78:mysql (ESTABLISHED)
Hi,
Today we put proxysql on our production server, and we are facing the same issue. We haven't loaded any query rules yet
> select @@version
1.3.4-0-gd158707

select * from global_variables
global.txt
@alejg: can you please run pt-pmp as described above?
Also, would be possible to generate a core dump of the running process using gcore ?
Thanks
Yes,
proxysql-pt-pmp.txt
I'll send the core dump to an email (at sysown.com) ok?
Thanks!
Thank you @alejg .
My email is rene.[email protected] .
One more question: which package are you using? Which OS?
But before sending the core...
Looking the output of pt-pmp I have the feeling that proxysql is not spinning in a loop without reason, but actually serving a lot of traffic:
From the screenshot I can also deduce that mysqld running on the same box is quite busy too.
Therefore it is possible that the high CPU utilization you are seeing in ProxySQL is not because of a bug, but because it is really doing a lot of work.
Can you please also run these commands and let me know the output?
mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "SELECT * FROM stats_mysql_connection_pool; SHOW MYSQL STATUS;"
sleep 30
mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "SELECT * FROM stats_mysql_connection_pool; SHOW MYSQL STATUS;"
Thanks
Thank you @renecannao!!
CentOS Linux release 7.2.1511
proxysql-1.3.4-1-centos7.x86_64.rpm
You are right, is a very busy server which spends most of his time doing IO. Worked fine for several hours, now the CPU raised up the load average above 18.
Thanks!
Continuing this thread via email.
I will update the issue with further finding.
I am also having this problem intermittently. My Server is idle with
Tasks: 126 total, 1 running, 125 sleeping, 0 stopped, 0 zombie
Cpu(s): 38.9%us, 58.8%sy, 0.0%ni, 1.0%id, 0.0%wa, 0.0%hi, 0.1%si, 1.2%st
Mem: 8059196k total, 7313936k used, 745260k free, 140376k buffers
Swap: 0k total, 0k used, 0k free, 974752k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30931 proxysql 20 0 318m 86m 2624 S 363.7 1.1 120:20.48 proxysql
19023 root 20 0 325m 57m 4832 S 14.3 0.7 89:14.53 mysqld_exporter
28096 mysql 20 0 14.1g 1.2g 33m S 9.6 15.2 5:55.45 mysqld
1372 root 20 0 343m 24m 3288 S 8.0 0.3 797:42.80 node_exporter
# sudo strace -c -p 30931
Process 30931 attached
^CProcess 30931 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
98.72 4.380782 273799 16 nanosleep
1.28 0.056584 56584 1 restart_syscall
------ ----------- ----------- --------- --------- ----------------
100.00 4.437366 17 total
restarting proxysql solves the issue temporarily
@doddys , if you run strace without -f flag, strace will only check the main thread and nothing more: this is exactly what you got, a strace of the main thread that is doing nothing more but nanosleep() .
How frequently does it happen?
Would it be possible to generate a core dump of the running process using gcore ? From the output above it seems it uses really little memory, so a dump should take really a small amount of time (possibly a fraction of second).
Thanks
let me try to get a core dump and get back to you
On Mar 14, 2017, at 8:14 PM, René Cannaò notifications@github.com wrote:
@doddys https://github.com/doddys , if you run strace without -f flag, strace will only check the main thread and nothing more: this is exactly what you got, a strace of the main thread that is doing nothing more but nanosleep() .
How frequently does it happen?
Would it be possible to generate a core dump of the running process using gcore https://sourceware.org/gdb/onlinedocs/gdb/gcore-man.html ? From the output above it seems it uses really little memory, so a dump should take really a small amount of time (possibly a fraction of second).
Thanks—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/sysown/proxysql/issues/826#issuecomment-286416977, or mute the thread https://github.com/notifications/unsubscribe-auth/AQmB1O57zQ8Kl_xBoS3TxTGV022njG44ks5rlpLNgaJpZM4LGjqA.
@renecannao this is the dump that I get from
gcore -o proxysql.dump 30931
please rename to tgz and extract
@doddys : thanks a lot.
To confirm, this dump is generated while CPU usage was high, or idle? It needs to be while CPU usage is high.
Also, can you please let me know exactly which ProxySQL package you have installed?
Thanks
@renecannao Yes, it is generated during the cpu ~300% like the top above
# rpm -qa | grep proxy
proxysql-1.3.2-1.1.x86_64
# uname -a
Linux centos-8gb-sgp1-01 2.6.32-642.6.2.el6.x86_64 #1 SMP Wed Oct 26 06:52:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
It happens a few times already but I never pay attention when and how.
Let me know know if you need further info.
Thanks
Perfect!
I started looking into it, but it took me some time to figure out that the version installed is from Percona repository and from GitHub.
What version of CentOS are you using exactly? I tried on CentOS 6.7 but seems not the right one.
Thank you
@renecannao yes it is from percona repo.
I am using
# cat /etc/redhat-release
CentOS release 6.8 (Final)
i just notice that scheduler is invoking duplicate process like below
proxysql 7864 30931 1 13:52 ? 00:00:00 /bin/bash /var/lib/proxysql/proxysql_galera_checker_v2.sh 3 2 1 1 /var/lib/proxysql/galera_hg3.log
proxysql 7866 30931 1 13:52 ? 00:00:00 /bin/bash /var/lib/proxysql/proxysql_galera_checker_v2.sh 4 2 1 1 /var/lib/proxysql/galera_hg4.log
proxysql 7949 7866 0 13:52 ? 00:00:00 /bin/bash /var/lib/proxysql/proxysql_galera_checker_v2.sh 4 2 1 1 /var/lib/proxysql/galera_hg4.log
proxysql 7953 7864 0 13:52 ? 00:00:00 /bin/bash /var/lib/proxysql/proxysql_galera_checker_v2.sh 3 2 1 1 /var/lib/proxysql/galera_hg3.log
proxysql 7955 7949 0 13:52 ? 00:00:00 /bin/bash /var/lib/proxysql/proxysql_galera_checker_v2.sh 4 2 1 1 /var/lib/proxysql/galera_hg4.log
proxysql 7956 7955 0 13:52 ? 00:00:00 timeout 10 mysql -nNE -uproxysql -h 10.130.34.56 -P 3306 -e SHOW STATUS LIKE 'wsrep_local_state'
proxysql 7957 7955 0 13:52 ? 00:00:00 tail -1
proxysql 7958 7956 5 13:52 ? 00:00:00 mysql -nNE -uproxysql -h 10.130.34.56 -P 3306 -e SHOW STATUS LIKE 'wsrep_local_state'
proxysql 7960 7953 0 13:52 ? 00:00:00 /bin/bash /var/lib/proxysql/proxysql_galera_checker_v2.sh 3 2 1 1 /var/lib/proxysql/galera_hg3.log
proxysql 7961 7960 0 13:52 ? 00:00:00 timeout 10 mysql -nNE -uproxysql -h 10.130.34.56 -P 3306 -e SHOW STATUS LIKE 'wsrep_local_state'
proxysql 7962 7960 0 13:52 ? 00:00:00 tail -1
proxysql 7963 7961 4 13:52 ? 00:00:00 mysql -nNE -uproxysql -h 10.130.34.56 -P 3306 -e SHOW STATUS LIKE 'wsrep_local_state'
root 7965 16029 0 13:52 pts/0 00:00:00 grep proxysql
proxysql 30930 1 0 11:23 ? 00:00:00 /usr/bin/proxysql -c /etc/proxysql.cnf -D /var/lib/proxysql
proxysql 30931 30930 99 11:23 ? 03:56:50 /usr/bin/proxysql -c /etc/proxysql.cnf -D /var/lib/proxysql
Is this normal?
btw for whatever reason the cpu drop back to normal now. it lasted for about 1 hour
Ok, I will start looking into the dump using 6.8 .
With regards to the duplicate processes, there is no mechanism to limit the number of processes started by scheduler.
What's the output of SELECT * FROM scheduler ?
below are the ouput
mysql> select * from scheduler;
+----+--------+-------------+-------------------------------------------------+------+------+------+------+-----------------------------------------------+---------+
| id | active | interval_ms | filename | arg1 | arg2 | arg3 | arg4 | arg5 | comment |
+----+--------+-------------+-------------------------------------------------+------+------+------+------+-----------------------------------------------+---------+
| 1 | 0 | 10000 | /usr/bin/proxysql_galera_checker | 0 | 0 | 3 | 1 | /var/lib/proxysql/proxysql_galera_checker.log | |
| 2 | 0 | 10000 | /usr/bin/proxysql_galera_checker | 2 | 2 | 3 | 1 | /var/lib/proxysql/galera_hg2.log | |
| 3 | 1 | 10000 | /var/lib/proxysql/proxysql_galera_checker_v2.sh | 3 | 2 | 1 | 1 | /var/lib/proxysql/galera_hg3.log | |
| 4 | 1 | 10000 | /var/lib/proxysql/proxysql_galera_checker_v2.sh | 4 | 2 | 1 | 1 | /var/lib/proxysql/galera_hg4.log | |
+----+--------+-------------+-------------------------------------------------+------+------+------+------+-----------------------------------------------+---------+
4 rows in set (0.00 sec)
mysql> select * from runtime_scheduler;
+----+--------+-------------+-------------------------------------------------+------+------+------+------+-----------------------------------------------+---------+
| id | active | interval_ms | filename | arg1 | arg2 | arg3 | arg4 | arg5 | comment |
+----+--------+-------------+-------------------------------------------------+------+------+------+------+-----------------------------------------------+---------+
| 1 | 0 | 10000 | /usr/bin/proxysql_galera_checker | 0 | 0 | 3 | 1 | /var/lib/proxysql/proxysql_galera_checker.log | |
| 2 | 0 | 10000 | /usr/bin/proxysql_galera_checker | 2 | 2 | 3 | 1 | /var/lib/proxysql/galera_hg2.log | |
| 3 | 1 | 10000 | /var/lib/proxysql/proxysql_galera_checker_v2.sh | 3 | 2 | 1 | 1 | /var/lib/proxysql/galera_hg3.log | |
| 4 | 1 | 10000 | /var/lib/proxysql/proxysql_galera_checker_v2.sh | 4 | 2 | 1 | 1 | /var/lib/proxysql/galera_hg4.log | |
+----+--------+-------------+-------------------------------------------------+------+------+------+------+-----------------------------------------------+---------+
4 rows in set (0.00 sec)
as you can see I have 2 active schedulers, but it spawn more than 1 process per scheduler at the same time.
_maybe_ it is not spawning more than 1 process per scheduler at the same time.
It should spawn a process ever 10 seconds, and check the backends.
The fact that from ps you can see two checks against 10.130.34.56 is curious: a check should be very very fast, so if you manage to catch two checks against the same hosts maybe the checks are being slow, and a scheduled process takes longer than 10 seconds to complete, and therefore there is an overlap.
The check is fast. The script finished within 1 second. the weird thing eventhough two process running the same script, I can only see a single set of log entry
Tue Mar 14 14:22:04 UTC 2017 ###### proxysql_galera_checker.sh SUMMARY ######
Tue Mar 14 14:22:04 UTC 2017 Hostgroup writers 4
Tue Mar 14 14:22:04 UTC 2017 Hostgroup readers 2
Tue Mar 14 14:22:04 UTC 2017 Number of writers 1
Tue Mar 14 14:22:04 UTC 2017 Writers are readers 1
Tue Mar 14 14:22:04 UTC 2017 log file /var/lib/proxysql/galera_hg4.log
Tue Mar 14 14:22:05 UTC 2017 ###### HANDLE WRITER NODES ######
Tue Mar 14 14:22:05 UTC 2017 --> Checking WRITE server 4:10.130.34.56:3306, current status ONLINE, wsrep_local_state 4
Tue Mar 14 14:22:05 UTC 2017 server 4:10.130.34.56:3306 is already ONLINE: 1 of 1 write nodes
Tue Mar 14 14:22:05 UTC 2017 --> Checking WRITE server 4:10.130.33.225:3306, current status OFFLINE_SOFT, wsrep_local_state 4
Tue Mar 14 14:22:05 UTC 2017 server 4:10.130.33.225:3306 is already OFFLINE_SOFT, max write nodes reached (1)
Tue Mar 14 14:22:05 UTC 2017 --> Checking WRITE server 4:10.130.33.238:3306, current status OFFLINE_SOFT, wsrep_local_state 4
Tue Mar 14 14:22:05 UTC 2017 server 4:10.130.33.238:3306 is already OFFLINE_SOFT, max write nodes reached (1)
Tue Mar 14 14:22:05 UTC 2017 ###### HANDLE READER NODES ######
Tue Mar 14 14:22:05 UTC 2017 --> Checking READ server 2:10.130.34.56:3306, current status ONLINE, wsrep_local_state 4
Tue Mar 14 14:22:05 UTC 2017 server 2:10.130.34.56:3306 is already ONLINE
Tue Mar 14 14:22:05 UTC 2017 --> Checking READ server 2:10.130.33.225:3306, current status ONLINE, wsrep_local_state 4
Tue Mar 14 14:22:05 UTC 2017 server 2:10.130.33.225:3306 is already ONLINE
Tue Mar 14 14:22:05 UTC 2017 --> Checking READ server 2:10.130.33.238:3306, current status ONLINE, wsrep_local_state 4
Tue Mar 14 14:22:05 UTC 2017 server 2:10.130.33.238:3306 is already ONLINE
Tue Mar 14 14:22:05 UTC 2017 ###### SUMMARY ######
Tue Mar 14 14:22:05 UTC 2017 --> Number of writers that are 'ONLINE': 1 : hostgroup: 4
Tue Mar 14 14:22:05 UTC 2017 --> Number of readers that are 'ONLINE': 3 : hostgroup: 2
Tue Mar 14 14:22:05 UTC 2017 ###### Loading mysql_servers config into runtime ######
Tue Mar 14 14:22:15 UTC 2017 ###### proxysql_galera_checker.sh SUMMARY ######
Tue Mar 14 14:22:15 UTC 2017 Hostgroup writers 4
Tue Mar 14 14:22:15 UTC 2017 Hostgroup readers 2
Tue Mar 14 14:22:15 UTC 2017 Number of writers 1
Tue Mar 14 14:22:15 UTC 2017 Writers are readers 1
Tue Mar 14 14:22:15 UTC 2017 log file /var/lib/proxysql/galera_hg4.log
Tue Mar 14 14:22:15 UTC 2017 ###### HANDLE WRITER NODES ######
Tue Mar 14 14:22:15 UTC 2017 --> Checking WRITE server 4:10.130.34.56:3306, current status ONLINE, wsrep_local_state 4
Tue Mar 14 14:22:15 UTC 2017 server 4:10.130.34.56:3306 is already ONLINE: 1 of 1 write nodes
Tue Mar 14 14:22:15 UTC 2017 --> Checking WRITE server 4:10.130.33.225:3306, current status OFFLINE_SOFT, wsrep_local_state 4
Tue Mar 14 14:22:15 UTC 2017 server 4:10.130.33.225:3306 is already OFFLINE_SOFT, max write nodes reached (1)
Tue Mar 14 14:22:15 UTC 2017 --> Checking WRITE server 4:10.130.33.238:3306, current status OFFLINE_SOFT, wsrep_local_state 4
Tue Mar 14 14:22:15 UTC 2017 server 4:10.130.33.238:3306 is already OFFLINE_SOFT, max write nodes reached (1)
Tue Mar 14 14:22:15 UTC 2017 ###### HANDLE READER NODES ######
Tue Mar 14 14:22:15 UTC 2017 --> Checking READ server 2:10.130.34.56:3306, current status ONLINE, wsrep_local_state 4
Tue Mar 14 14:22:15 UTC 2017 server 2:10.130.34.56:3306 is already ONLINE
Tue Mar 14 14:22:15 UTC 2017 --> Checking READ server 2:10.130.33.225:3306, current status ONLINE, wsrep_local_state 4
Tue Mar 14 14:22:15 UTC 2017 server 2:10.130.33.225:3306 is already ONLINE
Tue Mar 14 14:22:15 UTC 2017 --> Checking READ server 2:10.130.33.238:3306, current status ONLINE, wsrep_local_state 4
Tue Mar 14 14:22:15 UTC 2017 server 2:10.130.33.238:3306 is already ONLINE
Tue Mar 14 14:22:15 UTC 2017 ###### SUMMARY ######
Tue Mar 14 14:22:15 UTC 2017 --> Number of writers that are 'ONLINE': 1 : hostgroup: 4
Tue Mar 14 14:22:15 UTC 2017 --> Number of readers that are 'ONLINE': 3 : hostgroup: 2
Tue Mar 14 14:22:15 UTC 2017 ###### Loading mysql_servers config into runtime ######
@doddys :
I couldn't yet find anything odd in the dump.
It seems that the instance is indeed idle:
What bugs me is servers_table_version = 2544 : that should means that LOAD MYSQL SERVERS TO RUNTIME was executed 2543 times, or about once every 2.72 seconds.
Can you please share proxysql's error log? (/var/lib/proxysql/proxysql.log)
I would like to dig into it.
Thank you
@renecannao I did notice that there are a lot of reload in the proxysql.log but I am not sure it was normal or not.
please rename to tgz and extract.
@doddys : there is a bug in proxysql_galera_checker.sh , because LOAD MYSQL SERVERS TO RUNTIME is always called even if no changes are performed.
Speaking with @geotro few days ago, I think we have a solution for this.
Calling LOAD MYSQL SERVERS TO RUNTIME has some costs, and it is possible that calling it too often has unexpected side effect. I will investigate more.
Thank you for the file!
@doddys , interesting finding!
Number of LOAD MYSQL SERVERS per hour:
$ egrep '^2017-03-14 .*LOAD MYSQL SERVERS' proxysql.log | cut -b 1-13 | sort | uniq -c
714 2017-03-14 00
706 2017-03-14 01
704 2017-03-14 02
704 2017-03-14 03
706 2017-03-14 04
702 2017-03-14 05
704 2017-03-14 06
704 2017-03-14 07
842 2017-03-14 08
1408 2017-03-14 09
1416 2017-03-14 10
1408 2017-03-14 11
1350 2017-03-14 12
1011 2017-03-14 13
708 2017-03-14 14
706 2017-03-14 15
710 2017-03-14 16
710 2017-03-14 17
714 2017-03-14 18
712 2017-03-14 19
712 2017-03-14 20
714 2017-03-14 21
714 2017-03-14 22
74 2017-03-14 23
From the above, it seems that something happens between 8am and 9am, and something changed again between 1pm and 2pm.
This is indeed the case:
$ egrep -v '^HID' proxysql.log | egrep -v '(Received LOAD MYSQL SERVERS TO RUNTIME command|New mysql_replication_hostgroups table)' | egrep '^2017-03-14.*INFO'
2017-03-14 08:47:18 [INFO] Received save scheduler from mem command
2017-03-14 08:47:25 [INFO] Received load scheduler from mem command
2017-03-14 08:47:36 [INFO] Received save mysql servers from mem command
2017-03-14 08:47:47 [INFO] Received load mysql servers from mem command
2017-03-14 08:49:46 [INFO] Received save mysql users from mem command
2017-03-14 08:49:59 [INFO] Received load mysql users from mem command
2017-03-14 08:51:18 [INFO] Received load mysql query rules from mem command
2017-03-14 08:51:29 [INFO] Received save mysql query rules from mem command
2017-03-14 08:54:19 [INFO] Received load scheduler from mem command
2017-03-14 08:54:24 [INFO] Received save scheduler from mem command
2017-03-14 08:56:42 [INFO] Received load scheduler from mem command
2017-03-14 08:56:47 [INFO] Received save scheduler from mem command
2017-03-14 08:59:38 [INFO] Received load scheduler from mem command
2017-03-14 08:59:41 [INFO] Received save scheduler from mem command
2017-03-14 11:23:31 [INFO] Starting shutdown...
2017-03-14 11:23:32 [INFO] Shutdown completed!
2017-03-14 11:23:32 [INFO] Exiting...
2017-03-14 11:23:32 [INFO] Shutdown angel process
2017-03-14 11:23:32 [INFO] Starting ProxySQL
2017-03-14 11:23:32 [INFO] Sucessfully started
2017-03-14 11:23:32 [INFO] Angel process started ProxySQL process 30931
2017-03-14 12:51:55 [INFO] Received load scheduler from mem command
2017-03-14 12:52:08 [INFO] Received save scheduler to mem command
2017-03-14 13:52:14 [INFO] Received load scheduler from mem command
2017-03-14 13:52:19 [INFO] Received save scheduler from mem command
Scheduler entries changed:
ProxySQL was restarted at 11:23:32 , that means that the core dump you sent was generated at 13:19:02 (6930 seconds after 11:23:32).
In https://github.com/sysown/proxysql/issues/826#issuecomment-286429201 you commented that :
This is actually very interesting, because commands related to the scheduler were executed at 12:52ish and 13:52ish : exactly one hour apart.
Do you recall:
a) what did you change in the scheduler at these times?
b) if you run ps before of after running the commands on the scheduler?
Do you mind sharing the files /var/lib/proxysql/galera_hg*.log ? I would like to see if I can find anything else interesting during this time window.
Looking at the number of LOAD MYSQL SERVERS per hour I actually missed a detail, and I should have checked more granularity, using:
egrep '^2017-03-14 1[1234].*LOAD MYSQL SERVERS' proxysql.log | cut -b 1-16 | sort | uniq -c
The number of LOAD MYSQL SERVERS per minute:
@doddys : can you please try this modified version of proxysql_galera_checker.sh ?
proxysql_galera_checker-v2.sh.zip
@taylor840326 , @aydancoskun : are you also using proxysql_galera_checker.sh ?
@renecannao I did make some config changes after proxy restart at 11 ish ...
I applied the script that your provided. I notice a few things differ from previous behavior
mysql: [Warning] Using a password on the command line interface can be insecure.# watch -n 1 'mysql --login-path=proxysql -t -e "select * from stats_mysql_connection_pool where hostgroup in (2,3,4,94,92,93) order by hostgroup,srv_host ;" -e " select hostgroup_id,hostname,status,weight,comment from mysql_servers where hostgroup_id in (2,3,4) order by hostgroup_id,hostname ;"'
Every 1.0s: mysql --login-path=proxysql -t -e "select * from stats_mysql_connection_pool where hostgroup in (2,3,4,94,92,93) order by hostgrou... Wed Mar 15 02:23:21 2017
+-----------+---------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+---------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 2 | 10.130.33.225 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 205 |
| 2 | 10.130.33.238 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1384 |
| 2 | 10.130.34.56 | 3306 | ONLINE | 0 | 13 | 13 | 0 | 19678 | 519672 | 186768 | 447 |
| 3 | 10.130.33.225 | 3306 | OFFLINE_SOFT | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 205 |
| 3 | 10.130.33.238 | 3306 | ONLINE | 200 | 0 | 594 | 0 | 3806 | 460314 | 2670117 | 1384 |
| 3 | 10.130.34.56 | 3306 | OFFLINE_SOFT | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 447 |
| 4 | 10.130.33.225 | 3306 | OFFLINE_SOFT | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 205 |
| 4 | 10.130.33.238 | 3306 | OFFLINE_SOFT | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1384 |
| 4 | 10.130.34.56 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 447 |
+-----------+---------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
+--------------+---------------+--------------+------------+-----------+
| hostgroup_id | hostname | status | weight | comment |
+--------------+---------------+--------------+------------+-----------+
| 2 | 10.130.33.225 | ONLINE | 1 | PRIMARY |
| 2 | 10.130.33.238 | ONLINE | 1 | BACKUP |
| 2 | 10.130.34.56 | ONLINE | 1000000000 | BACKUP |
| 3 | 10.130.33.225 | ONLINE | 1 | BACKUP |
| 3 | 10.130.33.238 | OFFLINE_SOFT | 1000000000 | PRIMARY |
| 3 | 10.130.34.56 | OFFLINE_SOFT | 1 | BACKUP |
| 4 | 10.130.33.225 | ONLINE | 100000 | SECONDARY |
| 4 | 10.130.33.238 | OFFLINE_SOFT | 1 | BACKUP |
| 4 | 10.130.34.56 | OFFLINE_SOFT | 100000000 | PRIMARY |
+--------------+---------------+--------------+------------+-----------+
attached is the galera_hg logs
anyway, if the cpu spike again, is there anything you would like me to capture?
It seems I need to merge this checker with the checker from Percona.
anyway, if the cpu spike again, is there anything you would like me to capture?
Yes. Please refer to this comment.
Thanks.
I'm getting this within a few hours of starting up, for about a month now.
[root@LION:~]# mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "SELECT @@version" -NB
1.3.4-0-gd158707
[root@LION:~]# ps aux | grep proxysql
root 28041 0.0 0.0 16800 13148 ? Ssl Mar16 0:26 /srv/proxysql_exporter -web.listen-address=:42004
root 28482 0.0 0.0 34452 2972 ? S Mar16 0:00 proxysql -c /etc/proxysql.cnf -D /var/lib/proxysql
root 28484 87.9 0.4 336200 71332 ? Sl Mar16 5223:20 proxysql -c /etc/proxysql.cnf -D /var/lib/proxysql
[root@LION:~]# uname -a
Linux LION 4.9.0-2-amd64 #1 SMP Debian 4.9.13-1 (2017-02-27) x86_64 GNU/Linux
[root@LION:~]# mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "SELECT * FROM scheduler" -NB
1 1 5000 /var/lib/proxysql/proxysql_galera_checker.sh 1 2 1 1 /var/lib/proxysql/proxysql_galera_checker.log
[root@LION:~]# mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "SELECT * FROM runtime_scheduler" -NB
1 1 5000 /var/lib/proxysql/proxysql_galera_checker.sh 1 2 1 1 /var/lib/proxysql/proxysql_galera_checker.log

@bweston92 : from the graphs, it I am reading them correctly, it seems you have spiky CPU usage, and not a constantly high CPU usage. This could be anything, even traffic related.
Yet, in case it is proxysql_galera_checker.sh, can you modify it as reported in https://github.com/sysown/proxysql/issues/826#issuecomment-286611276 ?
Thanks
While I am on this issue...
@doddys : any more CPU weirdness after the change?
@renecannao the spike we have it at most 10 users a minute. Not only that I'd expect some correlation with mysqld

@renecannao I have not encountered the same problem anymore but I have upgraded the to 1.3.4 due to some crash during high number of connections. I will report when the problem reappear.
@renecannao also when I look at top it is just constantly 100% however everything still works fine.
@doddys : great!
@bweston92 : ok, this needs further investigation. Can you please collect some metrics as described in https://github.com/sysown/proxysql/issues/826#issuecomment-265441199 ?
Thanks
Ok, see attached for report.txt
report.txt
pt-pmp --pid 28484 --iterations 30
[root@LION:~]# pt-pmp --pid 28484 --iterations 30
Mon 20 Mar 16:06:17 GMT 2017
84 ../sysdeps/unix/syscall-template.S: No such file or directory.
240 pthread_cond_wait,wqueue::remove,ConsumerThread::run,start_thread,clone
120 epoll_wait,MySQL_Thread::run,mysql_worker_thread_func_idles,start_thread,clone
106 poll,MySQL_Thread::run,mysql_worker_thread_func,start_thread,clone
30 poll,admin_main_loop,start_thread,clone
30 nanosleep,usleep,Query_Cache::purgeHash_thread,mysql_shared_query_cache_funct,start_thread,clone
30 nanosleep,usleep,MySQL_Monitor::run,libstdc++::??(libstdc++.so.6),start_thread,clone
30 nanosleep,usleep,MySQL_Monitor::monitor_replication_lag,monitor_replication_lag_pthread,start_thread,clone
30 nanosleep,usleep,MySQL_Monitor::monitor_read_only,monitor_read_only_pthread,start_thread,clone
30 nanosleep,usleep,MySQL_Monitor::monitor_ping,monitor_ping_pthread,start_thread,clone
30 nanosleep,usleep,MySQL_Monitor::monitor_connect,monitor_connect_pthread,start_thread,clone
30 nanosleep,usleep,main
4 __waitpid,waitpid_thread,start_thread,clone
4 nanosleep,usleep,MySQL_Thread::run,mysql_worker_thread_func,start_thread,clone
4 MySQL_Thread::run,mysql_worker_thread_func,start_thread,clone
2 __random,rand,MySQL_Thread::run,mysql_worker_thread_func,start_thread,clone
2 __libc_send,my_send_async(mysql_async.c:174),vio_write(violite.c:441),net_real_write(net.c:438),net_flush(net.c:255),net_write_command(net.c:333),mthd_my_send_cmd(libmariadb.c:612),mysql_real_query(libmariadb.c:2538),mysql_real_query_start_internal(mysql_async.c:490),my_context_spawn(my_context.c:205)
1 poll,child_mysql,start_thread,clone
1 MySQL_Data_Stream::set_pollout,MySQL_Thread::run,mysql_worker_thread_func,start_thread,clone
1 __libc_send,my_send_async(mysql_async.c:174),vio_write(violite.c:441),net_real_write(net.c:438),net_flush(net.c:255),net_write_command(net.c:333),mthd_my_send_cmd(libmariadb.c:612),send_change_user_packet(my_auth.c:239),client_mpvio_write_packet(my_auth.c:239),native_password_auth_client(my_auth.c:132),run_plugin_auth(my_auth.c:646),mysql_change_user(libmariadb.c:2222),mysql_change_user_start_internal(mysql_async.c:846),my_context_spawn(my_context.c:205)
@bweston92 :
According to pt-pmp , proxysql is mainly sending traffic to backend.
But the output of perf is really really odd.
clear_page_c_e is a kernel call related to memory management, and SpookyHash::Final should be extremely fast! Is this a VM? Which Linux version?
@renecannao nope bare metal.
[root@LION:~]# cat /etc/issue
Debian GNU/Linux 9 \n \l
[root@LION:~]# uname -a
Linux LION 4.9.0-2-amd64 #1 SMP Debian 4.9.13-1 (2017-02-27) x86_64 GNU/Linux
[root@LION:~]# cat /proc/cpuinfo | grep -c hyper
0
Let's start tuning ProxySQL to use the correct number of threads.
As your system seems to have 4 cores, and there are a lot of services running, I would configured ProxySQL to use only one core running this on Admin:
SET mysql-threads=1;
SAVE MYSQL VARIABLES TO DISK;
PROXYSQL RESTART;
Thanks
By there is a lot you mean ProxySQL and MySQL alone right?
Also the CPU has just jumped right down, will report back if anything goes back up.
What may of caused this?
By "a lot" I mean that maybe there is more than ProxySQL and MySQL : I see also mongod and corosync using their amount of CPU.
I am not sure what causes the issue, but the output of perf report is really odd.
Let's start configuring the right number of threads, and let's see if the issue is related or not
Ok left it over night and now it is back.
Threads is still set to one:
| Variable_name | Value |
+----------------------------------------+----------------------------------------------+
| mysql-threads | 1 |
+----------------------------------------+----------------------------------------------+
[root@LION:~]# systemctl status proxysql
● proxysql.service - Cluster Controlled proxysql
Loaded: loaded (/etc/init.d/proxysql; generated; vendor preset: enabled)
Drop-In: /run/systemd/system/proxysql.service.d
└─50-pacemaker.conf
Active: active (running) since Thu 2017-03-16 11:52:19 GMT; 4 days ago
Docs: man:systemd-sysv-generator(8)
Process: 28473 ExecStart=/etc/init.d/proxysql start (code=exited, status=0/SUCCESS)
Tasks: 13 (limit: 4915)
Memory: 326.0M
CPU: 4d 2h 1min 13.877s
CGroup: /system.slice/proxysql.service
├─28482 proxysql -c /etc/proxysql.cnf -D /var/lib/proxysql
└─28484 proxysql -c /etc/proxysql.cnf -D /var/lib/proxysql


@bweston92 : I am really sorry to see that you still have these spikes.
Did you modify proxysql_galera_checker.sh as suggested before?
After you modify proxysql_galera_checker.sh , please collect metrics again with perf, but this type using this command:
perf record -a
Leave it run for some time, like 15-30 minutes.
Then again:
perf report > report.txt
Thank you
A bit of bash kung fu:
for i in `head -n 43723 report-latest.txt | tail -n 43712 | grep -v '0.00%' | awk '{print $2}' | sort | uniq -c | awk '{print $2}'` ; do echo -n `cut -b 1-29 report-latest.txt | grep "$i " | tr '%' ' ' | awk '{print $1}' | grep -v '0.00' | awk '{ SUM += $1} END { print SUM }'` ; echo "% $i" ; done | sort -n -r
37.04% swapper
22.33% proxysql
15.15% mysqld
5.61% corosync
4.26% mongod
2.74% rsync
1.93% kthreadd
1.72% proxysql_galera
1.38% node_exporter
0.9% kworker/3:0
0.81% mysql
0.77% rcu_sched
0.45% kworker/2:1
0.44% systemd
0.44% modprobe
0.41% jbd2/dm-7-8
0.33% ha_logd
0.33% flannel
0.32% date
0.28% pcs
0.21% bash
0.21% IPaddr2
0.19% ps
0.19% awk
0.17% docker
0.17% dbus-daemon
0.15% proxysql_export
0.14% mongodb_exporte
0.14% kworker/2:1H
0.13% ksoftirqd/2
0.11% systemctl
0.11% lrmd
0.11% attrd
0.1% tail
0.08% pacemakerd
0.08% crmd
0.07% kworker/3:1H
0.04% timeout
0.04% mkdir
0.04% grep
0.03% wsrep_sst_rsync
0.03% systemd-journal
0.03% proxySQLWatcher
0.03% jbd2/dm-2-8
0.03% iptables
0.02% sshd
0.02% send_arp
0.02% ntpd
The sum of these values is 99.43% , therefore the rounding error is minimal!
The above already show something interesting: proxysql is not the process/service using more cpu.
A bit more shell scripting, specifically to proxysql, a lot of time is spent in kernel space:
grep "proxysql " report-latest.txt | grep -v '0.00%' > proxysql_report-latest.txt
$ for i in `cat proxysql_report-latest.txt | awk '{print $3}' | sort | uniq -c | awk '{print $2}' | tr -d '[]'` ; do echo -n `awk '{print $1" "$3}' proxysql_report-latest.txt | grep $i | awk '{print $1}' | tr -d '%' | awk '{ SUM += $1} END { print SUM }'` ; echo "% $i" ; done | sort -n -r
13.35% kernel.kallsyms
6.9% proxysql
1.42% libpthread-2.24.so
0.31% unknown
0.29% libc-2.24.so
0.03% vdso
0.03% libstdc++.so.6.0.22
In user space, the amount of cpu time spent by proxysql is less than mysqld:
grep "mysqld " report-latest.txt | grep -v '0.00%' > mysqld_report-latest.txt
for i in `cat mysqld_report-latest.txt | awk '{print $3}' | sort | uniq -c | awk '{print $2}' | tr -d '[]'` ; do echo -n `awk '{print $1" "$3}' mysqld_report-latest.txt | grep $i | awk '{print $1}' | tr -d '%' | awk '{ SUM += $1} END { print SUM }'` ; echo "% $i" ; done | sort -n -r
8.68% mysqld
3.51% kernel.kallsyms
1.24% libc-2.19.so
0.98% unknown
0.27% libgalera_smm.so
0.18% libaio.so.1.0.1
0.15% libstdc++.so.6.0.20
0.11% libpthread-2.19.so
0.03% ld-2.19.so
But what concerns me more is the first line of perf report:
12.55% swapper [kernel.kallsyms] [k] intel_idle
I suspect that the odd CPU behaviour you are seeing is caused by power management.
Is it possible to disable any power management in this box?
Thanks
@renecannao ok, however:
Only when the machine is running ProxySQL does this issue occur.
@bweston92 , to be clear, I am not saying "nothing wrong with ProxySQL" .
I am saying that the behavior you are seeing (and that I am not able to reproduce in hundreds of different servers) may be triggered by something specific in your system, and we need to understand what triggers it.
We also encountered this, CPU usage around 200%.
I realized the log file keep recording this every 2 seconds:
2017-03-26 21:34:04 [INFO] New mysql_replication_hostgroups table
writer_hostgroup: 1 , reader_hostgroup: 2, TMS Base Writer and Reader Group
writer_hostgroup: 3 , reader_hostgroup: 4, TMS TM Writer and Reader Group
HID: 1 , address: HOST1 , port: 3306 , weight: 1 , status: OFFLINE_HARD , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 2 , address: HOST1_RO , port: 3306 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 2 , address: HOST1 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 3 , address: HOST2 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 4 , address: HOST2_RO , port: 3306 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 4 , address: HOST2 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 100 , address: HOST3 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 1 , address: HOST1 , port: 3306 , weight: 1 , status: OFFLINE_HARD , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 2 , address: HOST1_RO , port: 3306 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 2 , address: HOST1 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 3 , address: HOST2 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 4 , address: HOST2_RO , port: 3306 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 4 , address: HOST2 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 100 , address: HOST3 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 1 , address: HOST1 , port: 3306 , weight: 1 , status: OFFLINE_HARD , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 2 , address: HOST1_RO , port: 3306 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 2 , address: HOST1 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 3 , address: HOST2 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 4 , address: HOST2_RO , port: 3306 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 4 , address: HOST2 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 100 , address: HOST3 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 1 , address: HOST1 , port: 3306 , weight: 1 , status: OFFLINE_HARD , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 2 , address: HOST1_RO , port: 3306 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 2 , address: HOST1 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 3 , address: HOST2 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 4 , address: HOST2_RO , port: 3306 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 4 , address: HOST2 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 100 , address: HOST3 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment:
2017-03-26 21:34:06 [INFO] New mysql_replication_hostgroups table
writer_hostgroup: 1 , reader_hostgroup: 2, TMS Base Writer and Reader Group
writer_hostgroup: 3 , reader_hostgroup: 4, TMS TM Writer and Reader Group
OFFLINE_HARD looks weird here; the host is accessible otherwise.
This is the top output:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
997 root 20 0 276160 37768 5052 S 199.7 0.5 14575:48 proxysql
@oytuntez: the output you provided is not really helpful in debugging what is the problem.
This issue is full of examples of how to collect metric: the output of top gives no information.
Similarly about the entries in the error log: copying the last 200-300 lines gives more information than "this is recorded every 2 seconds", because, maybe, it is also written what cause these lines to be written
I will collect more once it occurs again. This was in production, so didn't have much time.
i'm am also experiencing this issue. CPU drops to 0% for a few seconds, swap spikes to 100% usage, then CPU spikes to 100% across all cores. I have a fairly simple setup. 3 servers, singlewrite mode. Some times it goes days without an issue, Other-times a few hours. In all cases, when the CPU drops to 0%, proxysql stops handling requests and CPU spikes.
I first thought it was a configuration error so I reinstalled the entire OS and re-setup proxysql. But it started happening last night again. I'm using proxysql version 1.3.6-1 on Ubuntu 16.04.2 LTS. I am running 5.7.18-15-57-log Percona XtraDB Cluster (GPL), Release rel15, Revision 7693d6e, WSREP version 29.20, wsrep_29.20.
What information would you need from me to help you debug if this is a proxysql issue?
@NextStepGuru : try 1.3.7 , this patch may solve random hangs.
But what concerns me in your comment is swap spiking to 100% : this could be a cause for unpredictable consequences. Is it proxysql swapping? How much memory is using?
I have deployed multi-cluster MySQL with Jelastic Cloud and also have peaks at 200% CPU for no apparent reason.
The servers are isolated, there is only ProxySQL on it, the only connection that uses it is mine and ProxySQL always ends up racing.
I am using version 2.0.12 with mysql 8.0.20.
Most helpful comment
A bit of bash kung fu:
The sum of these values is 99.43% , therefore the rounding error is minimal!
The above already show something interesting: proxysql is not the process/service using more cpu.
A bit more shell scripting, specifically to
proxysql, a lot of time is spent in kernel space:In user space, the amount of cpu time spent by proxysql is less than mysqld:
But what concerns me more is the first line of perf report:
I suspect that the odd CPU behaviour you are seeing is caused by power management.
Is it possible to disable any power management in this box?
Thanks