Lightgbm: why the hessian matrix are not square matrix in multiclass classification

Created on 26 May 2019 · 3Comments · Source: microsoft/LightGBM

I wanna ask a question, why the Hessian matrix are not a square matrix? In the optimization theory, for f(x) = f(x_k)+ f(x - x_k)^T g + (1/2)(x - x_k)^T H (x -x_k). the x,x_k are vector, g is the gradient vector, H are the Hessian matrix and it should be a square matrix!!! But in the lightgbm, the shape of Hessian are not square matrix in multiclass classification.

question

Source

Hongyun1993

All 3 comments

@Hongyun1993 我理解是优化的对象不同，通常我们是对参数做优化，n个参数，求导后是一个n维的向量，求二阶导后就是一个矩阵。而这里GBDT是对函数做优化，这里的函数是一个标量，求导后也是一个标量，求二阶导后还是个标量。

nwnlp on 26 Jul 2019

😕2

refer to https://www.kaggle.com/c/PLAsTiCC-2018/discussion/71328

guolinke on 1 Aug 2019

@Hongyun1993 我理解是优化的对象不同，通常我们是对参数做优化，n个参数，求导后是一个n维的向量，求二阶导后就是一个矩阵。而这里GBDT是对函数做优化，这里的函数是一个标量，求导后也是一个标量，求二阶导后还是个标量。
Automatic Translation:
I understand that the optimization object is different. Usually we optimize the parameters. For n parameters, the n-dimensional vector is obtained after the derivative, and the second derivative is a matrix. Here, GBDT is an optimization of the function. The function here is a scalar, and the derivative is also a scalar. The second derivative is still a scalar.