Hi everyone, first of all, thank you for the huge amount of work you have done on this project.
After a brief pass through the CMA-ES implementation, I found a few inconsistencies with the documentation.
The documentation of the Optimize function states that the last parameter (iterate) is also used as the starting point of the algorithm. However, this parameter seems to be basically ignored and the starting point is generated randomly here.
When the algorithm does not converge, it suggests to use a smaller step size here, yet there is no way to change the step size.
The lowerBound and upperBound variables don't seem to be true bounds for the variables. As far as I understand the code, those are only used to deduce the initial sigma value and the random starting point.
Have a nice day!
Hello Filip,
thanks for taking the time to look into the code.
- The function Optimize states that its parameter iterate is also used as the starting point of the algorithm. However, this parameter seems to be basically ignored and the starting point is generated randomly here.
I see your point, that we should be able to provide and use initial parameter; the issue I saw is that CMAES really kicks off if we start with some "reasonable" parameter; but "reasonable" highly depends on the task, and I agree that the parameter has to be tailored down for each task.
- When the algorithm does not converge, it suggests to use a smaller step size here, yet there is no way to change the step size.
Good point, we should introduce two new methods to control the step size damping and the cumulation constant.
- The lowerBound and upperBound variables don't seem to be true bounds for the variables. As far as I understand the code, those are only used to deduce the initial sigma value and the random starting point.
Right, I think we would have to define some general strategy to define optimizer wide bounds.
I will look into each point, in the next days and open a PR, unless someone likes to pick this up, in which case I'm happy to share the work.
Thank you for such a swift reply. The first and the second point should be quite straightforward. The third point could be a bit tricky. One way might be to project and optimize the variables in some kind of bounded space (e.g., evaluate (sin(x)+1)*(upper-lower)/2+lower instead of x). But in such a case, the recommended default sigma and maybe some other parameters will probably need to be scaled appropriately.
I am tagging the father of the algorithm - @nikohansen, maybe he has some guidelines for this (he usually does). :slightly_smiling_face:
Agreed, the sin approach works fine as long as upper-lower is not too large. To be able to set very loose bounds without running into numerical issues, I use a piecewise linear and quadratic function (sort of a periodic Huber loss type), see here. The way I construct it has the added convenience feature that the transformation is the identity when not close to the bounds.
For any periodic transformation, the initial value for sigma should not exceed, say, 1/4 of the period width in the definition space, which is 2 * pi / 4 for the sin. Even then, a variable may end up in a different period than it started, which is not a problem in itself (it may just come as a surprise to the user).
@zoq Are you working on this ? If not I would like to take a look. Thanks
Please feel free to work on this one.
I opened mlpack/ensmallen#70; since the optimization framework is now part of ensmallen, we can continue any discussion and implementation there. :+1:
Most helpful comment
Please feel free to work on this one.