Tidb: analyze table failed for table with charset latin1

Created on 19 Jul 2020  ·  18Comments  ·  Source: pingcap/tidb

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

mysql> create table t (v1 varchar(30)) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_bin;
Query OK, 0 rows affected (0.09 sec)
$ python2.7
>>> f = open("1.sql", "w")
>>> f.write('INSERT INTO `t` VALUES ("\xe4NKNO\xe6");\n')
>>> f.flush()
$ mysql -h 172.16.4.18 -uroot -P4000 -D t < 1.sql



md5-59881df6e1a78f1e055122238fecfe98



mysql> select * from t;
+--------+
| v1     |
+--------+
| �NKNO�   |
+--------+
1 row in set (0.00 sec)
mysql > 
mysql > analyze table t;
ERROR 1105 (HY000): other error: encoding failed



md5-939d721fe2a8bdb73928c968a61641a8



mysql> create table t (v1 varchar(30));
Query OK, 0 rows affected (0.09 sec)

[[email protected] ontime2]# mysql -h 172.16.4.18 -uroot -P4000 -D t < 1.sql
ERROR 1366 (HY000) at line 1: incorrect utf8 value e44e4b4e4fe6(�NKNO�) for column v1



md5-a3feee54b6027a2fb596104957cdca7e



$ ./tikv-server -V
TiKV 
Release Version:   4.1.0-alpha
Edition:           Community
Git Commit Hash:   8b1fc4fc67f6d74a46a86d731eb5c152cbf0dfa8
Git Commit Branch: master
UTC Build Time:    2020-07-14 01:06:28
Rust Version:      rustc 1.46.0-nightly (16957bd4d 2020-06-30)
Enable Features:   jemalloc portable sse protobuf-codec
Profile:           dist_release

mysql> select tidb_version()\G
*************************** 1. row ***************************
tidb_version(): Release Version: v4.0.0-beta.2-771-gca41972fb
Edition: Community
Git Commit Hash: ca41972fbac068c8a5de107d9075f09ac68842ac
Git Branch: master
UTC Build Time: 2020-07-14 02:41:21
GoVersion: go1.13
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false
1 row in set (0.00 sec)

5. Root Cause Analysis

PrioritP3 need-more-info severitcritical siinfra typbug

Most helpful comment

I add some trace to the Analyze code and found that the TiDB pushes down some wrong information about the collation of the string. It should be latin1 instead, the TiKV receives a Utf8Mb4BinNoPadding:

[src/coprocessor/statistics/analyze.rs:302] columns_slice[i].encode(*logical_row, &columns_info[i],
                        &mut EvalContext::default(), &mut val) = Ok(
    (),
)
[src/coprocessor/statistics/analyze.rs:310] columns_info[i].as_accessor().collation() = Ok(
    Utf8Mb4BinNoPadding,
)
[src/coprocessor/statistics/analyze.rs:313] table::decode_col_value(&mut mut_val, &mut EvalContext::default(),
                        &columns_info[i]) = Ok(
    Bytes("\344NKNO\346"),
)
[src/coprocessor/statistics/analyze.rs:319] CollatorUtf8Mb4BinNoPadding::sort_key(&decoded_val.as_string()?.unwrap().into_owned()) = Err(
    Encoding(
        Utf8Error {
            valid_up_to: 0,
            error_len: Some(
                1,
            ),
        },
    ),
)

@wjhuang2016 PTAL uses python2 to generate the SQL and it can reproduce in master branch.

All 18 comments

I can't reproduce it.
Use tiup playground with nightly version.
image

@wjhuang2016 I test with the latest tidb with reversion d941ff5cc8b4babf9dcfdd91b66a5c53b798c122 and tikv reversion 8f7fa2a17614f3bc87b63dfed8219b52406d70b6 the issue is still always happen.

the error log in tidb.log is as follows:

[2020/07/19 17:08:18.131 +08:00] [ERROR] [conn.go:744] ["command dispatched failed"] [conn=2] [connInfo="id:2, addr:10.9.85.17:60073 status:10, collation:utf8_general_ci, user:root"] [command=Query] [status="inTxn:0, autocommit:1"] [sql="analyze table t"] [txn_mode=PESSIMISTIC] [err="other error: encoding failed\ngithub.com/pingcap/tidb/store/tikv.(*copIteratorWorker).handleCopResponse\n\t/home/jenkins/agent/workspace/tidb_master/go/src/github.com/pingcap/tidb/store/tikv/coprocessor.go:1042\ngithub.com/pingcap/tidb/store/tikv.(*copIteratorWorker).handleTaskOnce\n\t/home/jenkins/agent/workspace/tidb_master/go/src/github.com/pingcap/tidb/store/tikv/coprocessor.go:838\ngithub.com/pingcap/tidb/store/tikv.(*copIteratorWorker).handleTask\n\t/home/jenkins/agent/workspace/tidb_master/go/src/github.com/pingcap/tidb/store/tikv/coprocessor.go:746\ngithub.com/pingcap/tidb/store/tikv.(*copIteratorWorker).run\n\t/home/jenkins/agent/workspace/tidb_master/go/src/github.com/pingcap/tidb/store/tikv/coprocessor.go:517\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"]

Seems the collation used in analyze is different from table schema

@glorv could you check if updating the dependency in go.mod change anything?

I can't reproduce either. @glorv can you try the following, so we can eliminate character set conversions by clients:

DROP TABLE IF EXISTS t;
CREATE TABLE t (v1 varchar(30)) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_bin;
INSERT INTO t VALUES (UNHEX('C3A44E4B4E4FC3A6'));
ANALYZE TABLE t;

I can reproduce on a fresh server with a shell script. But there seems to be some sort of flakyness to it. If I mysqldump and restore and then analyze that, it doesn't reproduce it, and this script also stops reproducing it.

#!/bin/bash

mysql test -e "DROP TABLE IF EXISTS t1;"
mysql test -e "CREATE TABLE t1 (c varchar(30)) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_bin;"

for i in `seq 0 255`; do mysql test -e "INSERT INTO t1 VALUES (unhex(hex($i)))"; done

mysql test -e "ANALYZE TABLE t1"

Seems this is a issue in tikv-side.

I deploy a cluster with v4.0.2 by ansible. This issue cannot be reproduced. tikv reversion is:

Release Version:   4.0.2
Edition:           Community
Git Commit Hash:   98ee08c587ab47d9573628aba6da741433d8855c
Git Commit Branch: heads/refs/tags/v4.0.2
UTC Build Time:    2020-07-01 09:34:18
Rust Version:      rustc 1.42.0-nightly (0de96d37f 2019-12-19)
Enable Features:   jemalloc portable sse protobuf-codec
Profile:           dist_release

After I manually update tikv version to the latest commit in master, this issue appears. TiKV version:

Release Version:   4.1.0-alpha
Edition:           Community
Git Commit Hash:   d1c0be1e7bae51735e6de4683a156374dfb917ee
Git Commit Branch: master
UTC Build Time:    2020-07-20 04:42:06
Rust Version:      rustc 1.46.0-nightly (16957bd4d 2020-06-30)
Enable Features:   jemalloc portable sse protobuf-codec
Profile:           dist_release

BTW, use sql INSERT INTO t VALUES (UNHEX('C3A44E4B4E4FC3A6')); can not reproduce either.
By now, the only way to reproduce this issue is to generate a sql filed and use mysql < xxx.sql

Reproduce stably with the following SQL:

create table t (a int not null, primary key (a));
insert table t values (1);
select * from t;
analyze table t;

Fixed in: https://github.com/tikv/tikv/pull/8298

Reproduce stably with the following SQL:

create table t (a int not null, primary key (a));
insert table t values (1);
select * from t;
analyze table t;

Fixed in: tikv/tikv#8298

I manually test this issue with tikv-server built in pr https://github.com/tikv/tikv/pull/8298 with commit d3b92dac4156e5e2b37a404d1fdc1e8df3007249, this issue still reproduced stably.

BTW, pr#8298 fix a bug which cause tikv panic when analyze. But this issue return an error ERROR 1105 (HY000): other error: encoding failed, so they must be different issues.

@glorv I stick on the minimal reproduce procedure,however I can't reproduce this error:

mysql> select * from t;
+----------+
| v1       |
+----------+
| äNKNOæ   |
+----------+
1 row in set (0.00 sec)

mysql> analyze table t;
Query OK, 0 rows affected (0.10 sec)

Here is my cluster component version info:

Release Version: v4.0.0-beta.2-811-g847a3b73d
Edition: Community
Git Commit Hash: 847a3b73dc4f510f47bd1946540b3865c7c3ebd9
Git Branch: master
UTC Build Time: 2020-07-21 07:54:41
GoVersion: go1.14.4
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false

TiKV branch: https://github.com/tikv/tikv/pull/8298

I can only reproduce this bug while creating 1.sql in python 2, not in python 3. @glorv

The minimal reproduce should be generead by python2. If use python3, the actual value written to target file is 'INSERT INTO t VALUES ("\xc3\xa4NKNO\xc3\xa6");\n' though we want to write 'INSERT INTO t VALUES ("\xe4NKNO\xe6");\n'

I add some trace to the Analyze code and found that the TiDB pushes down some wrong information about the collation of the string. It should be latin1 instead, the TiKV receives a Utf8Mb4BinNoPadding:

[src/coprocessor/statistics/analyze.rs:302] columns_slice[i].encode(*logical_row, &columns_info[i],
                        &mut EvalContext::default(), &mut val) = Ok(
    (),
)
[src/coprocessor/statistics/analyze.rs:310] columns_info[i].as_accessor().collation() = Ok(
    Utf8Mb4BinNoPadding,
)
[src/coprocessor/statistics/analyze.rs:313] table::decode_col_value(&mut mut_val, &mut EvalContext::default(),
                        &columns_info[i]) = Ok(
    Bytes("\344NKNO\346"),
)
[src/coprocessor/statistics/analyze.rs:319] CollatorUtf8Mb4BinNoPadding::sort_key(&decoded_val.as_string()?.unwrap().into_owned()) = Err(
    Encoding(
        Utf8Error {
            valid_up_to: 0,
            error_len: Some(
                1,
            ),
        },
    ),
)

@wjhuang2016 PTAL uses python2 to generate the SQL and it can reproduce in master branch.

For Collation::Utf8Mb4BinNoPadding push down, errors will be thrown when content is illegal utf8 string.

Maybe Collation::Binary is better?

I add some trace to the Analyze code and found that the TiDB pushes down some wrong information about the collation of the string. It should be latin1 instead, the TiKV receives a Utf8Mb4BinNoPadding:

[src/coprocessor/statistics/analyze.rs:302] columns_slice[i].encode(*logical_row, &columns_info[i],
                        &mut EvalContext::default(), &mut val) = Ok(
    (),
)
[src/coprocessor/statistics/analyze.rs:310] columns_info[i].as_accessor().collation() = Ok(
    Utf8Mb4BinNoPadding,
)
[src/coprocessor/statistics/analyze.rs:313] table::decode_col_value(&mut mut_val, &mut EvalContext::default(),
                        &columns_info[i]) = Ok(
    Bytes("\344NKNO\346"),
)
[src/coprocessor/statistics/analyze.rs:319] CollatorUtf8Mb4BinNoPadding::sort_key(&decoded_val.as_string()?.unwrap().into_owned()) = Err(
    Encoding(
        Utf8Error {
            valid_up_to: 0,
            error_len: Some(
                1,
            ),
        },
    ),
)

@wjhuang2016 PTAL uses python2 to generate the SQL and it can reproduce in master branch.

@iosmanthus , Could you find out why comparing didn't cause the error?

Please edit this comment or add a new comment to complete the following information

Not a bug

  1. Remove the 'type/bug' label
  2. Add notes to indicate why it is not a bug

Duplicate bug

  1. Add the 'type/duplicate' label
  2. Add the link to the original bug

Bug

Note: Make Sure that 'component', and 'severity' labels are added
Example for how to fill out the template: https://github.com/pingcap/tidb/issues/20100

1. Root Cause Analysis (RCA) (optional)

2. Symptom (optional)

3. All Trigger Conditions (optional)

4. Workaround (optional)

5. Affected versions

6. Fixed versions

Can we add integration tests to verify the fix (and prevent future mistakes)?

Can we add integration tests to verify the fix (and prevent future mistakes)?

I not sure if it is necessary. In fact, we shouldn't allow writing non-UTF8 bytes to TiKV.

Was this page helpful?
0 / 5 - 0 ratings