Leela-zero: The reason of the Zero' strength

Created on 12 Nov 2017 · 3Comments · Source: leela-zero/leela-zero

Since you have replicated the paper, do you believe whether the strength of Go Zero is due to the Algorithm or the computer power that google use? I mean do they invented any new Algorithms?

question

Source

wensdong

Most helpful comment

But Alpha Go Zero uses less resources than previous versions of Alpha Go, that's an Algorithmic improvement.

The only reason it uses less resources is because the neural network is stronger, calling that an algorithmic improvement is IMHO rather weird because Zero is far simpler program than the old AlphaGo. I stated after the paper came out that you could easily get an AlphaGo Zero by stripping half of the code out of an existing strong open source program, and I published Leela Zero to demonstrate it.

The problem is the training. We are lucky that for Go it can be automatically generated and it is "only" a question of hardware [1]. It's been confirmed apparently that the estimates of 1500-2500 TPUs for training were correct: https://www.reddit.com/r/baduk/comments/7c1mff/some_alphago_highlights_by_aja_huang_conference/

DeepMind's CEO has confirmed this cost about 25M USD, so I believe this answers the question why others did not take this approach.

Remember also that having a lot of hardware allows you to try more approaches and discard the ones that give less good results. If one is combining existing algorithms this gives a large advantage. There are several times when there were multiple logical alternatives of combining existing things, and one turns out to be 600 ELO (~2 stones!) stronger than the other. Make a few wrong decisions along the way (because you can't afford to test them exhaustively), and you end up with a much weaker program. It is not a simple matter of "better hardware" vs "better algorithms", because "better hardware" allows you to try more things to find "better algorithms".

I don't believe it's fair to say that AlphaGo Zero "did not invent anything". Knowing one of the best ways to combine existing things is very valuable information, and they burned 25M USD figuring it out for us. (Some design decisions of Zero, particularly compared to Master, do smell like DeepMind wanted to "make a point" about the computer learning by itself.)

[1] For many of the other research, I am sure it is handy to have a parent company whose business is collecting data about everything...

gcp on 13 Nov 2017

👍5

All 3 comments

From the very top of the Readme, emphasis mine:

What

A Go program with no human provided knowledge. Using MCTS (but without Monte Carlo playouts) and a deep residual convolutional neural network stack.

This is a fairly faithful reimplementation of the system described in the Alpha Go Zero paper "Mastering the Game of Go without Human Knowledge". For all intents and purposes, it is an open source AlphaGo Zero.

Wait, what

If you are wondering what the catch is: you still need the network weights. No network weights are in this repository. If you manage to obtain the AlphaGo Zero weights, this program will be about as strong, provided you also obtain a few Tensor Processing Units. Lacking those TPUs, I'd recommend a top of the line GPU - it's not exactly the same, but the result would still be an engine that is far stronger than the top humans.

So the answer is, both. You need computing power to generate the new weights. But Alpha Go Zero uses less resources than previous versions of Alpha Go, that's an Algorithmic improvement. However, compared to leela-zero, the struggle is replicating the training, which is a difference in computing power, because the algorithm is published.

OmnipotentEntity on 12 Nov 2017

I am not quite familiar with reinforcement learning. However, the algorithms(techniques) the Zero use appear to be some existing ones, e.g. MCTS, ResNet, self-play. I believe other Go algorithms also apply these techniques. So why it is Deepmind that come up with the best Go algorithm?

Alex-Net got superior result because it invented Drop-out technique. ResNet got superior result vs CNN because it invented ResNet. What Go-Zero invented?

wensdong on 12 Nov 2017