I wanna ask a question, why the Hessian matrix are not a square matrix? In the optimization theory, for f(x) = f(x_k)+ f(x - x_k)^T g + (1/2)(x - x_k)^T H (x -x_k). the x,x_k are vector, g is the gradient vector, H are the Hessian matrix and it should be a square matrix!!! But in the lightgbm, the shape of Hessian are not square matrix in multiclass classification.
@Hongyun1993 我理解是优化的对象不同,通常我们是对参数做优化,n个参数,求导后是一个n维的向量,求二阶导后就是一个矩阵。而这里GBDT是对函数做优化,这里的函数是一个标量,求导后也是一个标量,求二阶导后还是个标量。
@Hongyun1993 我理解是优化的对象不同,通常我们是对参数做优化,n个参数,求导后是一个n维的向量,求二阶导后就是一个矩阵。而这里GBDT是对函数做优化,这里的函数是一个标量,求导后也是一个标量,求二阶导后还是个标量。
Automatic Translation:
I understand that the optimization object is different. Usually we optimize the parameters. For n parameters, the n-dimensional vector is obtained after the derivative, and the second derivative is a matrix. Here, GBDT is an optimization of the function. The function here is a scalar, and the derivative is also a scalar. The second derivative is still a scalar.