Ml-agents: Problem with firsts steps in training with curriculum learning

Created on 31 Jul 2018  路  17Comments  路  Source: Unity-Technologies/ml-agents

Hello everyone!

I'm "playing" a little bit with curriculum learning and just discovered what could be a bug in the toolkit (could be also mine) that has been introduced in the last releases (in v0.3 it worked). The resetParameters is not updated until the Academy.Done() is called, what means that the first steps of training will have the default values for these parameters and can't be changed from curriculum JSON file. Hope you understand it and it's not my mistake. Thank you in advance.

Cheers!

bug

Most helpful comment

I found some info about this online and have a fix

see - https://stackoverflow.com/questions/41385708/multiprocessing-example-giving-attributeerror/42383397

This problem seems to be a design feature of multiprocessing.Pool. See https://bugs.python.org/issue25053. For some reason Pool does not always work with objects not defined in an imported module. So you have to write your function into a different file and import the module.

They recommend moving the multiprocessing function to another .py file - however, I found that putting it before if __name__ == '__main__': fixed it for me. I'll create a PR - see #1102

All 17 comments

This could indeed be a bug as the first lesson of curriculum should kick in at the beginning. By the way, there is a new way to do curriculum on the current develop branch.

@phoenixSK https://github.com/Unity-Technologies/ml-agents/pull/1043 should fix this bug. It would be great if you try it!

Thanks for your fast answer & fix, but unfortunately I'm getting an error whenever I try to execute it, even if I don't specify the curriculum option. Below I put the error message, any idea what could be happening? (Maybe the problem is that I merged the current develop branch, not only #1043)

Traceback (most recent call last):
File "\", line 1, in \
File "C:\Users\phoenix\Anaconda3\envs\ml-agents\lib\multiprocessing\spawn.py", line 105, in spawn_main exitcode = _main(fd)
File "C:\Users\phoenix\Anaconda3\envs\ml-agents\lib\multiprocessing\spawn.py", line 115, in _main self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'run_training' on

Can you try a couple of things:

  1. Cherry pick 36c4d560e0dbba7e72e54e45a6a6ef138650b3c5 and see if that works.
  2. If that doesn't work, try syncing with develop again and make sure to do a pip install . after checking out the branch.

Unfortunately I did not see that error in my testing. Is there a default environment where you see this error?

I tried solution 1 and it worked as expected, so congrats and thanks! But, just FYI, tried solution 2 and still got the same error.
The environment I'm using is a personal one, created when version 0.2 was launched and updated with every release, and it's the first time I'm getting an error like this one.

You're welcome! Glad it worked for you.

Could you provide more details about the errors you're getting? We're planning on shipping this update to curriculums in the next release so it would help us a ton if you found a bug.

Just a note, the way you pass a curriculum to learn has been changed on develop. TLDR, you now specify a "curriculum folder" that contains a JSON file that has the same name as the brain you want that curriculum associated to. This allows for multiple curriculums for multiple brains in the same environment. Check out the updated docs here: https://github.com/Unity-Technologies/ml-agents/blob/develop/docs/Training-Curriculum-Learning.md.

Thanks!

I recently synced to the Develop Branch, and I am also getting the same error as soon as I launch the python trainer. (However in my case, I am not trying to use curriculum learning. )

I have done a "pip install ." on my newly installed develop branch.

The aforementioned fix "36c4d56" is already merged in the develop code base.

Traceback (most recent call last):
File "", line 1, in
File "C:\Users\MY_USER\Anaconda3\envs\ml-agents\lib\multiprocessing\spawn.py", line 105, in spawn_main exitcode = _main(fd)
File "C:\Users\MY_USER\Anaconda3\envs\ml-agents\lib\multiprocessing\spawn.py", line 115, in _main self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'run_training' on

@EstonianGiant Thanks for reporting this error. We are having trouble replicating this error on our machines so any additional information about your development environment will help. We will try replicating this on a Windows machine ASAP.

Could you confirm a couple of things?

  • You are on Python 3.6
  • You get this error in the Docker container as well

Thanks!

Hi @dericp,

Yes I'm on Python 3.6.
I will check about the Docker Container and reply shortly.

Here's my anaconda info:

active environment : ml-agents
active env location : C:\Users\MY_USER\Anaconda3\envs\ml-agents
shell level : 2
user config file : C:\Users\MY_USER\.condarc
populated config files : C:\Users\MY_USER\.condarc
conda version : 4.5.4
conda-build version : 3.10.5
python version : 3.6.5.final.0
base environment : C:\Users\MY_USER\Anaconda3  (writable)
channel URLs : https://repo.anaconda.com/pkgs/main/win-64
                      https://repo.anaconda.com/pkgs/main/noarch
                      https://repo.anaconda.com/pkgs/free/win-64
                      https://repo.anaconda.com/pkgs/free/noarch
                      https://repo.anaconda.com/pkgs/r/win-64
                      https://repo.anaconda.com/pkgs/r/noarch
                      https://repo.anaconda.com/pkgs/pro/win-64
                      https://repo.anaconda.com/pkgs/pro/noarch
                      https://repo.anaconda.com/pkgs/msys2/win-64
                      https://repo.anaconda.com/pkgs/msys2/noarch
package cache : C:\Users\MY_USER\Anaconda3\pkgs
                      C:\Users\MY_USER\AppData\Local\conda\conda\pkgs
envs directories : C:\Users\MY_USER\Anaconda3\envs
                      C:\Users\MY_USER\AppData\Local\conda\conda\envs
                      C:\Users\MY_USER\.conda\envs
platform : win-64
user-agent : conda/4.5.4 requests/2.18.4 CPython/3.6.5 Windows/10 Windows/10.0.17134
administrator : False
netrc file : None
offline mode : False

@dericp,

I can't test the docker installation as it requires Windows 10 Pro or Enterprise. (I have Home Edition.)

@EstonianGiant Thank you, this is really helpful. We will try to get to the bottom of it next week.

@dericp

Okay, cool. FYI in my local code base I've temporarily disabled using the multiprocessing training (as I don't need that feature right now), and everything else seems to work correctly.

i.e. in learn.py:

 #    jobs = []
 #   for i in range(num_runs):
     if seed == -1:
         use_seed = np.random.randint(0,9999)
     else:
         use_seed = seed
 #       p = multiprocessing.Process(target=run_training, args=(i, use_seed))
 #       jobs.append(p)
 #       p.start()
 run_training(0,use_seed)

I had the same problem today after updating a project to develop - I'm using Python 3.6.2 :: Anaconda custom (64-bit)

disabling multiprocessing following @EstonianGiant 's sets worked for me

I found some info about this online and have a fix

see - https://stackoverflow.com/questions/41385708/multiprocessing-example-giving-attributeerror/42383397

This problem seems to be a design feature of multiprocessing.Pool. See https://bugs.python.org/issue25053. For some reason Pool does not always work with objects not defined in an imported module. So you have to write your function into a different file and import the module.

They recommend moving the multiprocessing function to another .py file - however, I found that putting it before if __name__ == '__main__': fixed it for me. I'll create a PR - see #1102

Nice fix @Sohojoe !

Thanks for making this fix, @Sohojoe!

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Sohojoe picture Sohojoe  路  3Comments

Rodnyy picture Rodnyy  路  3Comments

GeriBP picture GeriBP  路  3Comments

DavidLining picture DavidLining  路  3Comments

MrGitGo picture MrGitGo  路  4Comments