Ml-agents: Editor crashes every time I try to play using an Internal brain type

Created on 26 Jun 2018 · 58Comments · Source: Unity-Technologies/ml-agents

I just installed the newest ml-agents beta following the newest installation guides today, 06/25/2018. I had v0.2 beta working fine on another machine, so I know roughly how it's supposed to behave. What I am experiencing now is that all example scenes work fine on Player type brain and Heuristic type brain, but any time I set it to Internal type brain and use the provided bytes file for each example, the Unity Editor crashes upon pressing Play. I am new to crashes, so I'm not sure how to troubleshoot. I've attached the Editor log but I don't know how to read it for relevant information. I'm using Unity 2018.1.2f1
Editor.log

bug

Source

beardordie

😕1

Most helpful comment

@xiaomaogy @beardordie @m4Ssa Hi guys, i think i found solve of the issue. I clone repository into new clear diretory, done all things by instruction of v0.5, but imported TensorFlowSharp for ml-agents v0.3 from here https://github.com/TimothyA86/ml-agents/blob/master/docs/Installation.md.

My laptop has AMD A8-3520m processor, which without AVX support. Build of @Setmaster doesn`t works.

kudyk on 28 Oct 2018

👍2

All 58 comments

@beardordie Did you follow the new documentation guide? Do you have the new TensorflowSharp plugin? Have you installed the new python packages?

Also can you list out your detailed steps that leads to a crash?

xiaomaogy on 26 Jun 2018

I'm having the same issue, the demo scenes work with player, heuristic and external but crash when internal is used. I'm using Unity 2018.1.6f1 and Anaconda. Editor.log, video

Setmaster on 27 Jun 2018

@Setmaster Thanks for the video and log, but this information is not enough for us to help you. Please tell us the detailed steps to reproduce your error, specify things like what os you are using, which installation guide did you follow etc.

xiaomaogy on 28 Jun 2018

@xiaomaogy
Win 10
Version 1803
Build 17134.112
Followed this repo's installation guide, choosing to use Anaconda and after that followed the basic guide and setup the project as it was instructed.

Some issues I had but solved:

Running learn.py ModuleNotF oundError:No module named 'docopt' - Solved it by wiriting (ml-agents) C:\Users\vi7or\Documents\Repositories\ml-agents\python>python ./learn.py --run-id=run01 --train

Then I had an issue with tensorflow - Solved by installing tensorflow using conda instead of pip

Setmaster on 28 Jun 2018

Running learn.py ModuleNotFoundError:No module named 'docopt' - Solved it by wiriting (ml-agents) C:\Users\vi7or\Documents\Repositories\ml-agents\python>python ./learn.py --run-id=run01 --train

Did you use py instead of python before?

xiaomaogy on 28 Jun 2018

Also after you click the play button in the editor, does the play button just get stuck there like that forever? Have you made sure you are using Tensorflow 1.7.1 in your python environment, and used the latest version of TensorFlowSharp plugin in the basic guide?

xiaomaogy on 28 Jun 2018

I don't remember trying py before and I'm not sure what you mean by stuck, a few seconds after pressing play the editor will crash. Here is a list of installed packages which includes the correct version of Tensorflow. I downloaded the package again from here ,and Unity says there is nothing new to import, also I downloaded the package and installed it yesterday so I believe I'm using the latest version.

Setmaster on 28 Jun 2018

Yes I followed the newest guide. Yes I had the tensorsharp plug-in
imported. yes I had all proper versions of packages as listed in the
documentation requirements file, and I tried multiple versions of
tensorflow, all cpu versions, and I tried with anaconda as well as without.
I think my processor must not be compatible with tensorflow. Intel core 2
extreme.

But that being said, does Python and tensorflow actually need to be
installed on the machine just to run already-trained models from bytes
files?

On Tue, Jun 26, 2018, 1:34 PM Vincent(Yuan) Gao notifications@github.com
wrote:

@beardordie https://github.com/beardordie Did you follow the new
documentation guide? Do you have the new TensorflowSharp plugin? Have you
installed the new python packages?

Also can you list out your detailed steps that leads to a crash?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/Unity-Technologies/ml-agents/issues/918#issuecomment-400452700,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AfpopO7dBfHn-Fixazeu-LrCtH8m4YcNks5uAprYgaJpZM4U3FVD
.

beardordie on 28 Jun 2018

@beardordie No it doesn't require the python and tensorflow to be installed. But we haven't tested on the cpu you have.

@Setmaster This is something I've never seen, I've tested our repo with the Windows 10 with Unity 2018.1, and the Internal Brain works without any crash.

Can you guys build the Unity executable with the Internal Brain checked, then run the built executable in the command line and see what happens? Without any error message I am not able to even guess what's going on wrong here.....

xiaomaogy on 30 Jun 2018

@xiaomaogy By executable do you mean player? About the cmd, what parameter would be used for this? Also, I built a player with the brain set to internal anyway, and it crashed when executed. Here is a copy of the player if it's useful.

Setmaster on 30 Jun 2018

Hi @Setmaster, By executable I mean the stuff you've provided here.

I've tried to run your built executable provided above on my windows machine (Win 10), and it works without any crash. To this point I'm pretty sure it is a machine specific things. @beardordie Does this built executable work on your computer?

@Setmaster Is there any thing special about your computer? Have you tried this on any other computer?

xiaomaogy on 30 Jun 2018

@xiaomaogy I don't think so, here are some specs:
CPU: Intel Core i7 Extreme 980X @ 3.33GHz
Motherboard: ASUSTeK Computer INC. Rampage III Extreme (LGA1366)
SSD: Samsung SSD 850 PRO
Graphics: GTX 1080 EVGA

I ran it on another computer without issues.

Setmaster on 30 Jun 2018

@Setmaster You ran it on another computer and it works? So what's the difference between that computer vs your own computer?

xiaomaogy on 30 Jun 2018

That built executable crashes on my computer with Intel Core 2 Extreme. I'm not surprised that this older processor is not working, but I am surprised that a computer with the specs xiaomaogy listed would have any trouble with it.

beardordie on 30 Jun 2018

For reference, the GPU in my Intel Core 2 Extreme PC (which crashes upon using internal brain) has an AMD Radeon HD 5800. Again, a much older computer, but it should be able to run an internal brain regardless of whether it supports all the tensorflow stuff to train new brains.

beardordie on 30 Jun 2018

@xiaomaogy The other computer was a Thinkpad notebook, I don't know the exact specs, but I presume all of them are different from mine. Both my CPU and beardordie's seem to be quite old, maybe that's to blame?

Setmaster on 30 Jun 2018

Hello, same issue.

My specs : Windows10 / Unity 2018.1.0f2
Old proc too : i920 (hyper threading desactivated for OC purpose)

Liven28 on 2 Jul 2018

@mmattar The windows machine we have is working, but for these people it seems that certain cpu specs will make the internal brain crash.

xiaomaogy on 10 Jul 2018

additionnal information, I have this message when installing TFSharpPlugin :
Unloading broken assembly Assets/ML-Agents/Plugins/Android/TensorFlowSharp.Android.dll, this assembly can cause crashes in the runtime

And I upgraded Unity to 2018.2.0f2 but the problem persists

Liven28 on 11 Jul 2018

Hello, I'm also experiencing the same issue.

Specs: Windows10 / Unity 2018.1.0f / TF 1.7.1 / I7 Q740

Edit: I also had to build my TF from sources since the CPU does not have AVX support and the stock version didnt work.

m4Ssa on 12 Jul 2018

@m4Ssa @Livenvh @beardordie Could you please try the older version of the TensorFlowSharp plugin available here (https://s3.amazonaws.com/unity-ml-agents/0.3/ML-AgentsWithPlugin.unitypackage)? If the editor stops crashes with the older TensorFlowSharp plugin, then I will try to update this plugin and see if that can fix the problem. Right now I don't have a windows machine that will crash with the steps you guys described, so I am not able to find a solution for this.

xiaomaogy on 13 Jul 2018

I'm not pursuing ML on this machine anymore, sorry. If I get a day to kill
some time with, I may give it a shot and update this thread.

On Thu, Jul 12, 2018, 5:31 PM Vincent(Yuan) Gao notifications@github.com
wrote:

@m4Ssa https://github.com/m4Ssa @Livenvh https://github.com/Livenvh
@beardordie https://github.com/beardordie Could you please try the
older version of the TensorFlowSharp plugin available here (
https://s3.amazonaws.com/unity-ml-agents/0.3/ML-AgentsWithPlugin.unitypackage),
with the TF 1.4.0 (change requirement.txt and then pip install . to
change the tensorflow back to 1.4.0)? If the editor stops crashes with the
older TensorFlowSharp plugin, then I will try to update this plugin and see
if that can fix the problem. Right now I don't have a windows machine that
will crash with the steps you guys described, so I am not able to find a
solution for this.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/Unity-Technologies/ml-agents/issues/918#issuecomment-404690567,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AfpopPIn0awdxxfyQKO2sbXBuJrEmcH3ks5uF-p2gaJpZM4U3FVD
.

beardordie on 13 Jul 2018

I would be happy to test run builds, though, if that's helpful, since
presumably the crash would happen on this machine even running someone
else's game/program if it uses internal brain. I just don't have Python
installed at all anymore and don't have time/plans to install it again here.

On Thu, Jul 12, 2018, 5:31 PM Vincent(Yuan) Gao notifications@github.com
wrote:

@m4Ssa https://github.com/m4Ssa @Livenvh https://github.com/Livenvh
@beardordie https://github.com/beardordie Could you please try the
older version of the TensorFlowSharp plugin available here (
https://s3.amazonaws.com/unity-ml-agents/0.3/ML-AgentsWithPlugin.unitypackage),
with the TF 1.4.0 (change requirement.txt and then pip install . to
change the tensorflow back to 1.4.0)? If the editor stops crashes with the
older TensorFlowSharp plugin, then I will try to update this plugin and see
if that can fix the problem. Right now I don't have a windows machine that
will crash with the steps you guys described, so I am not able to find a
solution for this.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/Unity-Technologies/ml-agents/issues/918#issuecomment-404690567,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AfpopPIn0awdxxfyQKO2sbXBuJrEmcH3ks5uF-p2gaJpZM4U3FVD
.

beardordie on 13 Jul 2018

@beardordie Actually python is not related to this crash, so you don't need to install it to test it. If you have time to test, that would be really helpful. Thanks in advance.

xiaomaogy on 13 Jul 2018

@xiaomaogy
I had test the old TensorFlowSharp (on 2018.2)
=> The type or namespace name `CommunicatorParameters' could not be found (in RpcCommunicator and SocketCommunicator scripts

Liven28 on 13 Jul 2018

@Livenvh How did you test it? It seems that some of your c# script has been changed. Are you sure all of your .cs scripts inside Assets/ML-Agents folder are in sync with the v0.4 master, and only the TensorFlowSharp plugin has been switched to the older version?

Also 2018.2 might not work, please use 2018.1.

xiaomaogy on 13 Jul 2018

juste finish to test :
New 2018.1 project + fresh ml-agent 0.4 (juste unzip gitub version and adapt player setings) + TensorFlowSharp you link above => The type or namespace name `CommunicatorParameters' could not be found (in RpcCommunicator and SocketCommunicator scripts (same than 2018.2)

Liven28 on 13 Jul 2018

@Livenvh Did you run pip install . in the ml-agents/python folder? I will try to run this myself tomorrow I guess to see what's going on.

xiaomaogy on 13 Jul 2018

@xiaomaogy
When i try to use the older TFSharp version im getting an error in Unity:
Assets/ML-Agents/Scripts/RpcCommunicator.cs(23,9): error CS0246: The type or namespace name CommunicatorParameters' could not be found. Are you missing an assembly reference?
I've build my Tensorflow without Grpc support since it didnt work with Grpc enabled. Don't know if thats a related problem though.

m4Ssa on 13 Jul 2018

@xiaomaogy Yes I ran pip install . but always same message

Liven28 on 13 Jul 2018

@xiaomaogy
Ok so I opend a fresh project in 2018.1, didnt download ml-agents 0.4 but used the 0.3 version + TFSharp plugin from the package you posted earlier and now the internal brain works without a crash for me. Still having the 'CommunicatorParameters' error on 0.4 though.

m4Ssa on 13 Jul 2018

It seems that Unity 2018.2 doesn't trust TensorFlowSharp.Android.dll so it's unloaded when the Unity Platform target is set to Android. And with that .dll unloaded, the projects won't run in the Editor when the platform target is Android. They run fine on an Android device or in the Editor when the platform target is set to anything else than Android. (e.g. Windows)

When the project is loaded with TFSharp installed:
Unloading broken assembly Assets/ML-Agents/Plugins/Android/TensorFlowSharp.Android.dll, this assembly can cause crashes in the runtime

When the project runs with Platform target set to Android:
TypeLoadException: Could not find method due to a type load error
Brain.InitializeBrain (Academy aca, Communicator communicator) (at Assets/ML-Agents/Scripts/Brain.cs:209)

jjjuande on 13 Jul 2018

👍1

Same issue here. Brain type Player/Heuristic/External work fine. Unity crashes when the play button is clicked and the brain type is set to Internal.

Specs: Ubuntu 16.04 64-bit, Intel Core i7-6850K, Python 3.5.2, tensorflow 1.7.1, Unity 2018.2.0b2, ml-agent 0.4

also crashes after switching to tensorflow 1.7 and 1.9 with Unity 2018.2.0f2.

edited:
I tried on a different machine with the following settings and it works.
Specs: Ubuntu 18.04 64-bit, Intel Core i9-7940X, Python 3.6.5, tensorflow 1.9, Unity 2018.2.0f2, ml-agent 0.4

bmobear on 24 Jul 2018

Hi, same issue here. Internal brain crashes editor and .exe.

Win10 64bits, i7 960, gtx 970, Unity 2018.1.6f1, last TFSharp ml-agent 0.4

VicMP0 on 25 Jul 2018

Same issue for me too with all examples scenes provided with the toolkit ( v0.4b ).
Training (external) work just fine but if I want to check the result in internal mode , Unity just close after few seconds. It's crash with my bytes files but with the bytes files provided with the toolkit too. I delete and restart the project from scratch many times. Trying the master branch or the last release ( v0.4b ).

Because I thinking it's not directly related to my tensorflow installation and my trained bytes files , I give it a try with the TensorFlowSharp v0.3 instead of v0.4. Now it's working but not with all samples scenes. Only with one don't having "Discrete" visualisation or action vector space type. " Continuous" type work . Discrete one give me error ( ex: GridWorld scene for this error log ):

TFException: NodeDef mentions attr 'dilations' not in Op<name=Conv2D; signature=input:T, filter:T -> output:T; attr=T:type,allowed=[DT_HALF, DT_FLOAT]; attr=strides:list(int); attr=use_cudnn_on_gpu:bool,default=true; attr=padding:string,allowed=["SAME", "VALID"]; attr=data_format:string,default="NHWC",allowed=["NHWC", "NCHW"]>; NodeDef: main_graph_0_encoder0/conv_1/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 4, 4, 1], use_cudnn_on_gpu=true](visual_observation_0, main_graph_0_encoder0/conv_1/kernel/read). (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
TensorFlow.TFStatus.CheckMaybeRaise (TensorFlow.TFStatus incomingStatus, System.Boolean last) (at <6ed6db22f8874deba74ffe3e566039be>:0)
TensorFlow.TFGraph.Import (TensorFlow.TFBuffer graphDef, TensorFlow.TFImportGraphDefOptions options, TensorFlow.TFStatus status) (at <6ed6db22f8874deba74ffe3e566039be>:0)
TensorFlow.TFGraph.Import (System.Byte[] buffer, TensorFlow.TFImportGraphDefOptions options, TensorFlow.TFStatus status) (at <6ed6db22f8874deba74ffe3e566039be>:0)
TensorFlow.TFGraph.Import (System.Byte[] buffer, System.String prefix, TensorFlow.TFStatus status) (at <6ed6db22f8874deba74ffe3e566039be>:0)
MLAgents.CoreBrainInternal.InitializeCoreBrain (MLAgents.Batcher brainBatcher) (at Assets/ML-Agents/Scripts/CoreBrainInternal.cs:132)
MLAgents.Brain.InitializeBrain (MLAgents.Academy aca, MLAgents.Batcher brainBatcher) (at Assets/ML-Agents/Scripts/Brain.cs:211)
MLAgents.Academy.InitializeEnvironment () (at Assets/ML-Agents/Scripts/Academy.cs:288)
MLAgents.Academy.Awake () (at Assets/ML-Agents/Scripts/Academy.cs:227)

This is only true with my bytes files. Bytes files provided with the examples work fine with TensorFlowSharp v0.3. So I get stuck to only look pre-trained files come with examples and cannot see the results of my own experiements.

My specs:
TensorFlow 1.7.1 (compiled myself without AVX)
Python 3.5.1
Windows 10 Family 64x
Laptop Asus X553MA
Intel Pentium CPU N3540
Unity 2018.2.0f2
ML Agents ToolKit v0.4b
TensorFlowSharp v0.4 and TensorFlowSharp v0.3
8Go of rams

Pyroevil on 1 Aug 2018

Any news about that bug?

Liven28 on 26 Aug 2018

@Liven28 We are still not sure what is causing this bug, it works on our windows test machine so we are still not able to reproduce it.

xiaomaogy on 28 Aug 2018

@Pyroevil The error message you posted is saying the bytes file you generated is using a different tensorflow version than you place you are using it. If you want to try ml-agents v0.3, then you might want to switch all of them (including the tensorflowSharp plugin to v0.3, the tensorflow version to 1.4).

xiaomaogy on 28 Aug 2018

@jjjuande The problem you mentioned is a different issue, the TensorFlow.Android.dll file is showing the error message, but it is not the cause for the crash.

xiaomaogy on 28 Aug 2018

@m4Ssa So if you want to try ml-agents v0.3, switch all of them to v0.3, the communicator parameter error might be due to the protobuf file not compatible.

xiaomaogy on 28 Aug 2018

I had just reinstall windows10 Family 64x (with just windows and graphic drivers up to date, unity and ml-agent configured).
Same problem persist (crash on play internal brain).

Configuration : i7 920 / GTX 1060 / 12 Go Ram / Gigabyte GA-X58A-UD7
(no overclocking)

I tried on 2017.4 - 2018.1 - 2018.2 - 2018.3b
with Python 3.5.1, tensorflow 1.7.1, toolkit v0.4 / v0.5

Liven28 on 23 Sep 2018

Same problem here, everything works... except when I chose internal. It closes the screen as soon as I press play.

Using 2018.2.9f1 on Windows 10, CPU (not using GPU) Intel. Let me know if you want more info or testing,

zetaFairlight on 25 Sep 2018

😕1

@Liven28 @Gaby10 This is not something we can solve right now due to the reason I mentioned earlier. In v0.6 (which will be released in a few weeks)we will change the way internal brain works (It will be a scriptable object instead of a gameobject, and it will be called Learning Brain). If you guys want to try you can check this PR https://github.com/Unity-Technologies/ml-agents/pull/1250.

xiaomaogy on 27 Sep 2018

@xiaomaogy ok very interesting. I cross fingers.
If you need to test something, you know where I am.

Liven28 on 2 Oct 2018

same here.
Windows 10 Pro 64 bit, AMD Phenom(tm) || x4 965, Unity 2018.2.10f1
Training works fine, also in the editor. But if i click on play, with an internal brain, unity instantly closes.

jamu1989 on 2 Oct 2018

Same, internal brains not working, on play unity closes instantly, while training works without a problem.

Specs:
Windows 10 Pro 64-bit (Build 17763)
Intel Pentium G4620
Unity 2018.2.13f1

destructor465 on 21 Oct 2018

@xiaomaogy
When i try to use the older TFSharp version im getting an error in Unity:
Assets/ML-Agents/Scripts/RpcCommunicator.cs(23,9): error CS0246: The type or namespace name CommunicatorParameters' could not be found. Are you missing an assembly reference?
I've build my Tensorflow without Grpc support since it didnt work with Grpc enabled. Don't know if thats a related problem though.

Hello, the same issue, did you solve it, how?

kudyk on 27 Oct 2018

My laptop has AMD A8-3520m processor, which without AVX support. Build of @Setmaster doesn`t works.

kudyk on 28 Oct 2018

👍2

My laptop has AMD A8-3520m processor, which without AVX support. Build of @Setmaster doesn`t works.

My CPU doesn't support AVX too, I will need to try you way.

destructor465 on 29 Oct 2018

@xiaomaogy
Hello, my i7 920 doesn't support AVX, I tried kudyk solution (ml_agent v0.5 / TensorFlowSharp from v0.3 ml_agent) et it seems to work!

I just tried 3Dball scene with internal brain and Unity didn't crash.
I didn't make more tests, waiting for official news about the viability of this solution.

Liven28 on 31 Oct 2018

kudyk solution works, but most examples throws long exceptions, one of them:

TFException: NodeDef mentions attr 'output_dtype' not in Op<name=Multinomial; signature=logits:T, num_samples:int32 -> output:int64; attr=seed:int,default=0; attr=seed2:int,default=0; attr=T:type,allowed=[DT_FLOAT, DT_DOUBLE, DT_INT32, DT_INT64, DT_UINT8, DT_INT16, DT_INT8, DT_UINT16, DT_HALF]; is_stateful=true>; NodeDef: multinomial_3/Multinomial = Multinomial[T=DT_FLOAT, output_dtype=DT_INT64, seed=670408, seed2=108](dense_3/MatMul, multinomial_3/Multinomial/num_samples). (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).

Working examples: 3D Ball, Bouncer, Crawler, Reacher, Tennis and Walker.

destructor465 on 1 Nov 2018

kudyk solution works for me too

flurrux on 4 Nov 2018

is the v0.7 solve the problem?

Liven28 on 3 Mar 2019

Hi everyone, v0.7 works with tensorflow 1.7.0 from here.

kudyk on 3 Mar 2019

Does it meen that tensorflow 1.7.0 now worsk with non AVX CPUs (and we don't need to use 1.4.0 any more)
or that v0.7 doesn't work with 1.4.0 any more and non AVX CPUs can't use lm-agent ?

Liven28 on 3 Mar 2019

@Liven28 Tensorflow that i use form the link above is third-party, unofficial, built with sse2 support by @fo40225 user. He have build of 1.7.1 too, but only for cuda gpu, and it doesn`t work for me.

kudyk on 3 Mar 2019

Since we've switched from TensorFlowSharp to Barracuda, this issue is no longer relevant. I will close it for now. Feel free to open if you want to discuss more.

xiaomaogy on 4 Apr 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.