I wrote a customer sysbench test, the lua script file is named issue.lua, and the content is:
#!/usr/bin/env sysbench
function thread_init()
drv = sysbench.sql.driver()
con = drv:connect()
con:query([[
prepare stmt from "
select * from t1, t2
where t1.a = t2.a
and t2.a = ?
and t2.b = 'b'
and t2.c = 'c'
"
]])
end
function thread_done()
con:disconnect()
end
function event()
con:query("set @a = 1")
con:query("execute stmt using @a")
end
function prepare()
local drv = sysbench.sql.driver()
local con = drv:connect()
con:query("DROP TABLE IF EXISTS t1")
con:query([[
CREATE TABLE t1 (
a BIGINT,
b VARCHAR(40),
col3 VARCHAR(40) DEFAULT NULL,
col4 VARCHAR(40) DEFAULT NULL,
col5 VARCHAR(40) DEFAULT NULL,
col6 VARCHAR(40) DEFAULT NULL,
col7 VARCHAR(40) DEFAULT NULL,
col8 VARCHAR(40) DEFAULT NULL,
col9 VARCHAR(40) DEFAULT NULL,
col10 VARCHAR(40) DEFAULT NULL,
PRIMARY KEY (a, b)
)
]])
con:query([[INSERT INTO t1(a, b) VALUES (1, "1")]])
con:query([[INSERT INTO t1(a, b) VALUES (1, "2")]])
con:query([[INSERT INTO t1(a, b) VALUES (2, "1")]])
con:query([[INSERT INTO t1(a, b) VALUES (2, "2")]])
con:query([[INSERT INTO t1(a, b) VALUES (3, "1")]])
con:query([[INSERT INTO t1(a, b) VALUES (3, "2")]])
con:query("DROP TABLE IF EXISTS t2")
con:query([[
CREATE TABLE t2 (
a BIGINT,
b VARCHAR(40),
c VARCHAR(40),
col4 VARCHAR(40) DEFAULT NULL,
col5 VARCHAR(40) DEFAULT NULL,
col6 VARCHAR(40) DEFAULT NULL,
col7 VARCHAR(40) DEFAULT NULL,
col8 VARCHAR(40) DEFAULT NULL,
col9 VARCHAR(40) DEFAULT NULL,
col10 VARCHAR(40) DEFAULT NULL,
UNIQUE KEY idx_a (a)
)
]])
con:query([[INSERT INTO t2(a, b, c) VALUES(1, "b", "c")]])
con:query([[INSERT INTO t2(a, b, c) VALUES(2, "b", "c")]])
con:query([[INSERT INTO t2(a, b, c) VALUES(3, "b", "c")]])
end
function cleanup()
local drv = sysbench.sql.driver()
local con = drv:connect()
con:query("DROP TABLE IF EXISTS t1")
con:query("DROP TABLE IF EXISTS t2")
end
NOTE: the content is different from it described in #16817.
step 1: step 1: start a tidb-server with mocktikv and plan cache enabled
step 2: prepare table and data via the sysbench test:
sysbench --report-interval=1 --time=300 --threads=4 --mysql-host=127.0.0.1 --mysql-port=4000 --mysql-user=root --mysql-db=test --db-driver=mysql issue prepare
sysbench --report-interval=1 --time=300 --threads=4 --mysql-host=127.0.0.1 --mysql-port=4000 --mysql-user=root --mysql-db=test --db-driver=mysql issue run
explain for connection xx to see the execution plan on that connection, I got an error in the return result:TiDB([email protected]:test) > explain for connection 2;
ERROR 1105 (HY000): runtime error: index out of range [0] with length 0
and the panic stack in tidb log file is:
[2020/04/25 17:24:11.449 +08:00] [ERROR] [adapter.go:123] ["execute sql panic"] [conn=6] [sql="explain for connection 2"] [stack="github.com/pingcap/tidb/executor.(*recordSet).Next.func1\n\t/Users/jianzhang.zj/Code/tidb/executor/adapter.go:123\nruntime.gopanic\n\t/opt/goroot/src/runtime/panic.go:679\nruntime.goPanicIndex\n\t/opt/goroot/src/runtime/panic.go:75\ngithub.com/pingcap/tidb/expression.(*ParamMarker).GetUserVar\n\t/Users/jianzhang.zj/Code/tidb/expression/constant.go:80\ngithub.com/pingcap/tidb/expression.(*Constant).getLazyDatum\n\t/Users/jianzhang.zj/Code/tidb/expression/constant.go:186\ngithub.com/pingcap/tidb/expression.(*Constant).EvalString\n\t/Users/jianzhang.zj/Code/tidb/expression/constant.go:264\ngithub.com/pingcap/tidb/expression.(*builtinCastStringAsRealSig).evalReal\n\t/Users/jianzhang.zj/Code/tidb/expression/builtin_cast.go:1158\ngithub.com/pingcap/tidb/expression.(*ScalarFunction).EvalReal\n\t/Users/jianzhang.zj/Code/tidb/expression/scalar_function.go:354\ngithub.com/pingcap/tidb/expression.(*ScalarFunction).Eval\n\t/Users/jianzhang.zj/Code/tidb/expression/scalar_function.go:323\ngithub.com/pingcap/tidb/expression.(*Constant).getLazyDatum\n\t/Users/jianzhang.zj/Code/tidb/expression/constant.go:190\ngithub.com/pingcap/tidb/expression.(*Constant).Eval\n\t/Users/jianzhang.zj/Code/tidb/expression/constant.go:199\ngithub.com/pingcap/tidb/expression.(*Constant).ExplainInfo\n\t/Users/jianzhang.zj/Code/tidb/expression/explain.go:65\ngithub.com/pingcap/tidb/expression.(*ScalarFunction).explainInfo\n\t/Users/jianzhang.zj/Code/tidb/expression/explain.go:38\ngithub.com/pingcap/tidb/expression.(*ScalarFunction).ExplainInfo\n\t/Users/jianzhang.zj/Code/tidb/expression/explain.go:28\ngithub.com/pingcap/tidb/expression.sortedExplainExpressionList\n\t/Users/jianzhang.zj/Code/tidb/expression/explain.go:119\ngithub.com/pingcap/tidb/expression.SortedExplainExpressionList\n\t/Users/jianzhang.zj/Code/tidb/expression/explain.go:109\ngithub.com/pingcap/tidb/planner/core.(*PhysicalSelection).ExplainInfo\n\t/Users/jianzhang.zj/Code/tidb/planner/core/explain.go:282\ngithub.com/pingcap/tidb/planner/core.(*Explain).prepareOperatorInfo\n\t/Users/jianzhang.zj/Code/tidb/planner/core/common_plans.go:953\ngithub.com/pingcap/tidb/planner/core.(*Explain).explainPlanInRowFormat\n\t/Users/jianzhang.zj/Code/tidb/planner/core/common_plans.go:839\ngithub.com/pingcap/tidb/planner/core.(*Explain).explainPlanInRowFormat\n\t/Users/jianzhang.zj/Code/tidb/planner/core/common_plans.go:903\ngithub.com/pingcap/tidb/planner/core.(*Explain).explainPlanInRowFormat\n\t/Users/jianzhang.zj/Code/tidb/planner/core/common_plans.go:886\ngithub.com/pingcap/tidb/planner/core.(*Explain).explainPlanInRowFormat\n\t/Users/jianzhang.zj/Code/tidb/planner/core/common_plans.go:886\ngithub.com/pingcap/tidb/planner/core.(*Explain).explainPlanInRowFormat\n\t/Users/jianzhang.zj/Code/tidb/planner/core/common_plans.go:928\ngithub.com/pingcap/tidb/planner/core.(*Explain).RenderResult\n\t/Users/jianzhang.zj/Code/tidb/planner/core/common_plans.go:821\ngithub.com/pingcap/tidb/executor.(*ExplainExec).generateExplainInfo\n\t/Users/jianzhang.zj/Code/tidb/executor/explain.go:106\ngithub.com/pingcap/tidb/executor.(*ExplainExec).Next\n\t/Users/jianzhang.zj/Code/tidb/executor/explain.go:53\ngithub.com/pingcap/tidb/executor.Next\n\t/Users/jianzhang.zj/Code/tidb/executor/executor.go:249\ngithub.com/pingcap/tidb/executor.(*recordSet).Next\n\t/Users/jianzhang.zj/Code/tidb/executor/adapter.go:126\ngithub.com/pingcap/tidb/server.(*tidbResultSet).Next\n\t/Users/jianzhang.zj/Code/tidb/server/driver_tidb.go:369\ngithub.com/pingcap/tidb/server.(*clientConn).writeChunks\n\t/Users/jianzhang.zj/Code/tidb/server/conn.go:1436\ngithub.com/pingcap/tidb/server.(*clientConn).writeResultset\n\t/Users/jianzhang.zj/Code/tidb/server/conn.go:1402\ngithub.com/pingcap/tidb/server.(*clientConn).handleStmt\n\t/Users/jianzhang.zj/Code/tidb/server/conn.go:1309\ngithub.com/pingcap/tidb/server.(*clientConn).handleQuery\n\t/Users/jianzhang.zj/Code/tidb/server/conn.go:1278\ngithub.com/pingcap/tidb/server.(*clientConn).dispatch\n\t/Users/jianzhang.zj/Code/tidb/server/conn.go:901\ngithub.com/pingcap/tidb/server.(*clientConn).Run\n\t/Users/jianzhang.zj/Code/tidb/server/conn.go:715\ngithub.com/pingcap/tidb/server.(*Server).onConn\n\t/Users/jianzhang.zj/Code/tidb/server/server.go:415"]
no panic, executes successfully.
the error and panic described in the above.
tidb-server -V or run select tidb_version(); on TiDB)➜ ./bin/tidb-server -V
Release Version: v4.0.0-beta.2-352-g5ede18f10
Git Commit Hash: 5ede18f10eedfe2e3690d7728dec3ffa4b0af2d5
Git Branch: master
UTC Build Time: 2020-04-25 08:41:26
GoVersion: go1.13.10
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false
I found the cause of the problem, but I encountered some difficulties in solving it, seek help..
runner.lua
function thread_init()
con = sysbench.sql.driver():connect()
con:query("drop table if exists t0;")
con:query("create table t0(a int);")
con:query([[prepare stmt from "select * from t0 where a = ?";]])
con:query("set @a = 1")
end
function event()
con:query("execute stmt using @a")
end
function thread_done()
con:disconnect()
end
watcher.lua
function thread_init()
con = sysbench.sql.driver():connect()
end
function event()
con:query("explain for connection 1")
end
function thread_done()
con:disconnect()
end
Set PreparedPlanCache.Enabled is true in config.go, compile and run tidb.
Start a terminal, switch to the lua's directory, bench runner:
sysbench --report-interval=1 --time=3000 --threads=1 --mysql-host=127.0.0.1 --mysql-port=4000 --mysql-user=root --mysql-db=test --db-driver=mysql runner run
If connectionId of runner is not 1, please modify the watcher.lua.
sysbench --report-interval=1 --time=3000 --threads=1 --mysql-host=127.0.0.1 --mysql-port=4000 --mysql-user=root --mysql-db=test --db-driver=mysql watcher run
then the watcher failed, error info:
ERROR 1105 (HY000): runtime error: index out of range [0] with length 0
github.com/pingcap/tidb/executor.(*recordSet).Next.func1\n\t/mnt/d/project/pingcap/tidb/executor/adapter.go:123\nruntime.gopanic\n\t/home/littlefall/go/src/runtime/panic.go:967\nruntime.goPanicIndex\n\t/home/littlefall/go/src/runtime/panic.go:88\ngithub.com/pingcap/tidb/expression.(*ParamMarker).GetUserVar\n\t/mnt/d/project/pingcap/tidb/expression/constant.go:80\ngithub.com/pingcap/tidb/expression.(*Constant).getLazyDatum\n\t/mnt/d/project/pingcap/tidb/expression/constant.go:186\ngithub.com/pingcap/tidb/expression.(*Constant).EvalString\n\t/mnt/d/project/pingcap/tidb/expression/constant.go:264\ngithub.com/pingcap/tidb/expression.(*builtinCastStringAsRealSig).evalReal\n\t/mnt/d/project/pingcap/tidb/expression/builtin_cast.go:1182\ngithub.com/pingcap/tidb/expression.(*ScalarFunction).EvalReal\n\t/mnt/d/project/pingcap/tidb/expression/scalar_function.go:354\ngithub.com/pingcap/tidb/expression.(*ScalarFunction).Eval\n\t/mnt/d/project/pingcap/tidb/expression/scalar_function.go:323\ngithub.com/pingcap/tidb/expression.(*Constant).getLazyDatum\n\t/mnt/d/project/pingcap/tidb/expression/constant.go:190\ngithub.com/pingcap/tidb/expression.(*Constant).Eval\n\t/mnt/d/project/pingcap/tidb/expression/constant.go:199\ngithub.com/pingcap/tidb/expression.(*Constant).ExplainInfo\n\t/mnt/d/project/pingcap/tidb/expression/explain.go:65\ngithub.com/pingcap/tidb/expression.(*ScalarFunction).explainInfo\n\t/mnt/d/project/pingcap/tidb/expression/explain.go:38\ngithub.com/pingcap/tidb/expression.(*ScalarFunction).ExplainInfo\n\t/mnt/d/project/pingcap/tidb/expression/explain.go:28\ngithub.com/pingcap/tidb/expression.sortedExplainExpressionList\n\t/mnt/d/project/pingcap/tidb/expression/explain.go:119\ngithub.com/pingcap/tidb/expression.SortedExplainExpressionList\n\t/mnt/d/project/pingcap/tidb/expression/explain.go:109\ngithub.com/pingcap/tidb/planner/core.(*PhysicalSelection).ExplainInfo\n\t/mnt/d/project/pingcap/tidb/planner/core/explain.go:283\ngithub.com/pingcap/tidb/planner/core.(*Explain).prepareOperatorInfo\n\t/mnt/d/project/pingcap/tidb/planner/core/common_plans.go:957\ngithub.com/pingcap/tidb/planner/core.(*Explain).explainPlanInRowFormat\n\t/mnt/d/project/pingcap/tidb/planner/core/common_plans.go:843\ngithub.com/pingcap/tidb/planner/core.(*Explain).explainPlanInRowFormat\n\t/mnt/d/project/pingcap/tidb/planner/core/common_plans.go:907\ngithub.com/pingcap/tidb/planner/core.(*Explain).explainPlanInRowFormat\n\t/mnt/d/project/pingcap/tidb/planner/core/common_plans.go:932\ngithub.com/pingcap/tidb/planner/core.(*Explain).RenderResult\n\t/mnt/d/project/pingcap/tidb/planner/core/common_plans.go:825\ngithub.com/pingcap/tidb/executor.(*ExplainExec).generateExplainInfo\n\t/mnt/d/project/pingcap/tidb/executor/explain.go:106\ngithub.com/pingcap/tidb/executor.(*ExplainExec).Next\n\t/mnt/d/project/pingcap/tidb/executor/explain.go:53\ngithub.com/pingcap/tidb/executor.Next\n\t/mnt/d/project/pingcap/tidb/executor/executor.go:249\ngithub.com/pingcap/tidb/executor.(*recordSet).Next\n\t/mnt/d/project/pingcap/tidb/executor/adapter.go:126\ngithub.com/pingcap/tidb/server.(*tidbResultSet).Next\n\t/mnt/d/project/pingcap/tidb/server/driver_tidb.go:369\ngithub.com/pingcap/tidb/server.(*clientConn).writeChunks\n\t/mnt/d/project/pingcap/tidb/server/conn.go:1448\ngithub.com/pingcap/tidb/server.(*clientConn).writeResultset\n\t/mnt/d/project/pingcap/tidb/server/conn.go:1414\ngithub.com/pingcap/tidb/server.(*clientConn).handleStmt\n\t/mnt/d/project/pingcap/tidb/server/conn.go:1321\ngithub.com/pingcap/tidb/server.(*clientConn).handleQuery\n\t/mnt/d/project/pingcap/tidb/server/conn.go:1290\ngithub.com/pingcap/tidb/server.(*clientConn).dispatch\n\t/mnt/d/project/pingcap/tidb/server/conn.go:901\ngithub.com/pingcap/tidb/server.(*clientConn).Run\n\t/mnt/d/project/pingcap/tidb/server/conn.go:715\ngithub.com/pingcap/tidb/server.(*Server).onConn\n\t/mnt/d/project/pingcap/tidb/server/server.go:415
The reason for the problem is that ParamMaker will be reset during the query process. If it has been reseted and it has not been added, explain for connection accessing the array will cross the boundary.
In essence, it is a data race problem.
pull/16206 proposes a good method to try to solve this problem as follow, it checks whether value-get is in the execution of an explain statement. If in, it returns a empty value.
func (c *Constant) getLazyDatum() (dt types.Datum, isLazy bool, err error) {
if p := c.ParamMarker; p != nil {
if p.ctx.GetSessionVars().StmtCtx.InExplainStmt {
// Since `ParamMarker` is not nil only in prepare/execute context, the query must be `explain for connection` when coming here.
// The PreparedParams may have been reset already, to avoid panic, we just use the pre-evaluated datum for this constant.
return dt, false, nil
}
dt = p.GetUserVar()
isLazy = true
return
} else if c.DeferredExpr != nil {
dt, err = c.DeferredExpr.Eval(chunk.Row{})
isLazy = true
return
}
return
}
// GetType implements Expression interface.
func (c *Constant) GetType() *types.FieldType {
if p := c.ParamMarker; p != nil && !p.ctx.GetSessionVars().StmtCtx.InExplainStmt {
// GetType() may be called in multi-threaded context, e.g, in building inner executors of IndexJoin,
// so it should avoid data race. We achieve this by returning different FieldType pointer for each call.
tp := types.NewFieldType(mysql.TypeUnspecified)
dt := p.GetUserVar()
types.DefaultParamTypeForValue(dt.GetValue(), tp)
return tp
}
return c.RetType
}
But it doesn't work because sctx in c.ParamMarker is the sctx of the query plan, not the sctx of the explain statement.
Use the same method as 16206, but upload the explain sctx through Plan.ExplainInfo, SortedExplainExpressionList, Expression.ExplainInfo, Expression.Eval, builtinFunc.Eval, so many eval functions, and even expression.GetType.
I tried to do it, but found that the shortcomings are:
constant, you need to cal GetType(nil).null.A simple method is not to upload sctx, but before determining the value, first determine whether it will cross the boundary. If it will, returns a empty value and generates a warning.
func (d *ParamMarker) GetUserVar() (datum types.Datum) {
if d.order < len(sessionVars.PreparedParams) {
datum = sessionVars.PreparedParams[d.order]
} else {
sessionVars.StmtCtx.AppendWarning(errors.New(fmt.Sprintf("There is another session with explain query for this executing plan, it will get a null value for variables.")))
logutil.BgLogger().Warn("A query plan is being executed, please try again.")
}
}
The disadvantages of this solutions are:
The third method is to combine the above two, upload sctx + lock + return warning, you can feedback the warning to the session where explain, but the cost is the most.
I am confused about which method should be used. Which one do you think is better? Is there a more better way?
@zz-jason @eurekaka @lysu
To completely avoid the data race of stored plan in ProcessInfo, we can serialize the plan into string when storing it into ProcessInfo, and we can discard all these session context changes.
To completely avoid the data race of stored plan in
ProcessInfo, we can serialize the plan into string when storing it intoProcessInfo, and we can discard all these session context changes.
Thanks! I will try it later.
Most helpful comment
To completely avoid the data race of stored plan in
ProcessInfo, we can serialize the plan into string when storing it intoProcessInfo, and we can discard all these session context changes.