Ml-agents: Editor crashes every time I try to play using an Internal brain type

Created on 26 Jun 2018  Â·  58Comments  Â·  Source: Unity-Technologies/ml-agents

I just installed the newest ml-agents beta following the newest installation guides today, 06/25/2018. I had v0.2 beta working fine on another machine, so I know roughly how it's supposed to behave. What I am experiencing now is that all example scenes work fine on Player type brain and Heuristic type brain, but any time I set it to Internal type brain and use the provided bytes file for each example, the Unity Editor crashes upon pressing Play. I am new to crashes, so I'm not sure how to troubleshoot. I've attached the Editor log but I don't know how to read it for relevant information. I'm using Unity 2018.1.2f1
Editor.log

bug

Most helpful comment

@xiaomaogy @beardordie @m4Ssa Hi guys, i think i found solve of the issue. I clone repository into new clear diretory, done all things by instruction of v0.5, but imported TensorFlowSharp for ml-agents v0.3 from here https://github.com/TimothyA86/ml-agents/blob/master/docs/Installation.md.

My laptop has AMD A8-3520m processor, which without AVX support. Build of @Setmaster doesn`t works.

All 58 comments

@beardordie Did you follow the new documentation guide? Do you have the new TensorflowSharp plugin? Have you installed the new python packages?

Also can you list out your detailed steps that leads to a crash?

I'm having the same issue, the demo scenes work with player, heuristic and external but crash when internal is used. I'm using Unity 2018.1.6f1 and Anaconda. Editor.log, video

@Setmaster Thanks for the video and log, but this information is not enough for us to help you. Please tell us the detailed steps to reproduce your error, specify things like what os you are using, which installation guide did you follow etc.

@xiaomaogy
Win 10
Version 1803
Build 17134.112
Followed this repo's installation guide, choosing to use Anaconda and after that followed the basic guide and setup the project as it was instructed.

Some issues I had but solved:

Running learn.py ModuleNotFoundError:No module named 'docopt' - Solved it by wiriting (ml-agents) C:\Users\vi7or\Documents\Repositories\ml-agents\python>python ./learn.py --run-id=run01 --train

Then I had an issue with tensorflow - Solved by installing tensorflow using conda instead of pip

Running learn.py ModuleNotFoundError:No module named 'docopt' - Solved it by wiriting (ml-agents) C:\Users\vi7or\Documents\Repositories\ml-agents\python>python ./learn.py --run-id=run01 --train

Did you use py instead of python before?

Also after you click the play button in the editor, does the play button just get stuck there like that forever? Have you made sure you are using Tensorflow 1.7.1 in your python environment, and used the latest version of TensorFlowSharp plugin in the basic guide?

I don't remember trying py before and I'm not sure what you mean by stuck, a few seconds after pressing play the editor will crash. Here is a list of installed packages which includes the correct version of Tensorflow. I downloaded the package again from here ,and Unity says there is nothing new to import, also I downloaded the package and installed it yesterday so I believe I'm using the latest version.

Yes I followed the newest guide. Yes I had the tensorsharp plug-in
imported. yes I had all proper versions of packages as listed in the
documentation requirements file, and I tried multiple versions of
tensorflow, all cpu versions, and I tried with anaconda as well as without.
I think my processor must not be compatible with tensorflow. Intel core 2
extreme.

But that being said, does Python and tensorflow actually need to be
installed on the machine just to run already-trained models from bytes
files?

On Tue, Jun 26, 2018, 1:34 PM Vincent(Yuan) Gao notifications@github.com
wrote:

@beardordie https://github.com/beardordie Did you follow the new
documentation guide? Do you have the new TensorflowSharp plugin? Have you
installed the new python packages?

Also can you list out your detailed steps that leads to a crash?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/Unity-Technologies/ml-agents/issues/918#issuecomment-400452700,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AfpopO7dBfHn-Fixazeu-LrCtH8m4YcNks5uAprYgaJpZM4U3FVD
.

@beardordie No it doesn't require the python and tensorflow to be installed. But we haven't tested on the cpu you have.

@Setmaster This is something I've never seen, I've tested our repo with the Windows 10 with Unity 2018.1, and the Internal Brain works without any crash.

Can you guys build the Unity executable with the Internal Brain checked, then run the built executable in the command line and see what happens? Without any error message I am not able to even guess what's going on wrong here.....

@xiaomaogy By executable do you mean player? About the cmd, what parameter would be used for this? Also, I built a player with the brain set to internal anyway, and it crashed when executed. Here is a copy of the player if it's useful.

Hi @Setmaster, By executable I mean the stuff you've provided here.

I've tried to run your built executable provided above on my windows machine (Win 10), and it works without any crash. To this point I'm pretty sure it is a machine specific things. @beardordie Does this built executable work on your computer?

@Setmaster Is there any thing special about your computer? Have you tried this on any other computer?

@xiaomaogy I don't think so, here are some specs:
CPU: Intel Core i7 Extreme 980X @ 3.33GHz
Motherboard: ASUSTeK Computer INC. Rampage III Extreme (LGA1366)
SSD: Samsung SSD 850 PRO
Graphics: GTX 1080 EVGA

I ran it on another computer without issues.

@Setmaster You ran it on another computer and it works? So what's the difference between that computer vs your own computer?

That built executable crashes on my computer with Intel Core 2 Extreme. I'm not surprised that this older processor is not working, but I am surprised that a computer with the specs xiaomaogy listed would have any trouble with it.

For reference, the GPU in my Intel Core 2 Extreme PC (which crashes upon using internal brain) has an AMD Radeon HD 5800. Again, a much older computer, but it should be able to run an internal brain regardless of whether it supports all the tensorflow stuff to train new brains.

@xiaomaogy The other computer was a Thinkpad notebook, I don't know the exact specs, but I presume all of them are different from mine. Both my CPU and beardordie's seem to be quite old, maybe that's to blame?

Hello, same issue.

My specs : Windows10 / Unity 2018.1.0f2
Old proc too : i920 (hyper threading desactivated for OC purpose)

@mmattar The windows machine we have is working, but for these people it seems that certain cpu specs will make the internal brain crash.

additionnal information, I have this message when installing TFSharpPlugin :
Unloading broken assembly Assets/ML-Agents/Plugins/Android/TensorFlowSharp.Android.dll, this assembly can cause crashes in the runtime

And I upgraded Unity to 2018.2.0f2 but the problem persists

Hello, I'm also experiencing the same issue.

Specs: Windows10 / Unity 2018.1.0f / TF 1.7.1 / I7 Q740

Edit: I also had to build my TF from sources since the CPU does not have AVX support and the stock version didnt work.

@m4Ssa @Livenvh @beardordie Could you please try the older version of the TensorFlowSharp plugin available here (https://s3.amazonaws.com/unity-ml-agents/0.3/ML-AgentsWithPlugin.unitypackage)? If the editor stops crashes with the older TensorFlowSharp plugin, then I will try to update this plugin and see if that can fix the problem. Right now I don't have a windows machine that will crash with the steps you guys described, so I am not able to find a solution for this.

I'm not pursuing ML on this machine anymore, sorry. If I get a day to kill
some time with, I may give it a shot and update this thread.

On Thu, Jul 12, 2018, 5:31 PM Vincent(Yuan) Gao notifications@github.com
wrote:

@m4Ssa https://github.com/m4Ssa @Livenvh https://github.com/Livenvh
@beardordie https://github.com/beardordie Could you please try the
older version of the TensorFlowSharp plugin available here (
https://s3.amazonaws.com/unity-ml-agents/0.3/ML-AgentsWithPlugin.unitypackage),
with the TF 1.4.0 (change requirement.txt and then pip install . to
change the tensorflow back to 1.4.0)? If the editor stops crashes with the
older TensorFlowSharp plugin, then I will try to update this plugin and see
if that can fix the problem. Right now I don't have a windows machine that
will crash with the steps you guys described, so I am not able to find a
solution for this.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/Unity-Technologies/ml-agents/issues/918#issuecomment-404690567,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AfpopPIn0awdxxfyQKO2sbXBuJrEmcH3ks5uF-p2gaJpZM4U3FVD
.

I would be happy to test run builds, though, if that's helpful, since
presumably the crash would happen on this machine even running someone
else's game/program if it uses internal brain. I just don't have Python
installed at all anymore and don't have time/plans to install it again here.

On Thu, Jul 12, 2018, 5:31 PM Vincent(Yuan) Gao notifications@github.com
wrote:

@m4Ssa https://github.com/m4Ssa @Livenvh https://github.com/Livenvh
@beardordie https://github.com/beardordie Could you please try the
older version of the TensorFlowSharp plugin available here (
https://s3.amazonaws.com/unity-ml-agents/0.3/ML-AgentsWithPlugin.unitypackage),
with the TF 1.4.0 (change requirement.txt and then pip install . to
change the tensorflow back to 1.4.0)? If the editor stops crashes with the
older TensorFlowSharp plugin, then I will try to update this plugin and see
if that can fix the problem. Right now I don't have a windows machine that
will crash with the steps you guys described, so I am not able to find a
solution for this.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/Unity-Technologies/ml-agents/issues/918#issuecomment-404690567,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AfpopPIn0awdxxfyQKO2sbXBuJrEmcH3ks5uF-p2gaJpZM4U3FVD
.

@beardordie Actually python is not related to this crash, so you don't need to install it to test it. If you have time to test, that would be really helpful. Thanks in advance.

@xiaomaogy
I had test the old TensorFlowSharp (on 2018.2)
=> The type or namespace name `CommunicatorParameters' could not be found (in RpcCommunicator and SocketCommunicator scripts

@Livenvh How did you test it? It seems that some of your c# script has been changed. Are you sure all of your .cs scripts inside Assets/ML-Agents folder are in sync with the v0.4 master, and only the TensorFlowSharp plugin has been switched to the older version?

Also 2018.2 might not work, please use 2018.1.

juste finish to test :
New 2018.1 project + fresh ml-agent 0.4 (juste unzip gitub version and adapt player setings) + TensorFlowSharp you link above => The type or namespace name `CommunicatorParameters' could not be found (in RpcCommunicator and SocketCommunicator scripts (same than 2018.2)

@Livenvh Did you run pip install . in the ml-agents/python folder? I will try to run this myself tomorrow I guess to see what's going on.

@xiaomaogy
When i try to use the older TFSharp version im getting an error in Unity:
Assets/ML-Agents/Scripts/RpcCommunicator.cs(23,9): error CS0246: The type or namespace name CommunicatorParameters' could not be found. Are you missing an assembly reference?
I've build my Tensorflow without Grpc support since it didnt work with Grpc enabled. Don't know if thats a related problem though.

@xiaomaogy Yes I ran pip install . but always same message

@xiaomaogy
Ok so I opend a fresh project in 2018.1, didnt download ml-agents 0.4 but used the 0.3 version + TFSharp plugin from the package you posted earlier and now the internal brain works without a crash for me. Still having the 'CommunicatorParameters' error on 0.4 though.

It seems that Unity 2018.2 doesn't trust TensorFlowSharp.Android.dll so it's unloaded when the Unity Platform target is set to Android. And with that .dll unloaded, the projects won't run in the Editor when the platform target is Android. They run fine on an Android device or in the Editor when the platform target is set to anything else than Android. (e.g. Windows)

When the project is loaded with TFSharp installed:
Unloading broken assembly Assets/ML-Agents/Plugins/Android/TensorFlowSharp.Android.dll, this assembly can cause crashes in the runtime

When the project runs with Platform target set to Android:
TypeLoadException: Could not find method due to a type load error
Brain.InitializeBrain (Academy aca, Communicator communicator) (at Assets/ML-Agents/Scripts/Brain.cs:209)

Same issue here. Brain type Player/Heuristic/External work fine. Unity crashes when the play button is clicked and the brain type is set to Internal.

Specs: Ubuntu 16.04 64-bit, Intel Core i7-6850K, Python 3.5.2, tensorflow 1.7.1, Unity 2018.2.0b2, ml-agent 0.4

also crashes after switching to tensorflow 1.7 and 1.9 with Unity 2018.2.0f2.

edited:
I tried on a different machine with the following settings and it works.
Specs: Ubuntu 18.04 64-bit, Intel Core i9-7940X, Python 3.6.5, tensorflow 1.9, Unity 2018.2.0f2, ml-agent 0.4

Hi, same issue here. Internal brain crashes editor and .exe.

Win10 64bits, i7 960, gtx 970, Unity 2018.1.6f1, last TFSharp ml-agent 0.4

Same issue for me too with all examples scenes provided with the toolkit ( v0.4b ).
Training (external) work just fine but if I want to check the result in internal mode , Unity just close after few seconds. It's crash with my bytes files but with the bytes files provided with the toolkit too. I delete and restart the project from scratch many times. Trying the master branch or the last release ( v0.4b ).

Because I thinking it's not directly related to my tensorflow installation and my trained bytes files , I give it a try with the TensorFlowSharp v0.3 instead of v0.4. Now it's working but not with all samples scenes. Only with one don't having "Discrete" visualisation or action vector space type. " Continuous" type work . Discrete one give me error ( ex: GridWorld scene for this error log ):

TFException: NodeDef mentions attr 'dilations' not in Op<name=Conv2D; signature=input:T, filter:T -> output:T; attr=T:type,allowed=[DT_HALF, DT_FLOAT]; attr=strides:list(int); attr=use_cudnn_on_gpu:bool,default=true; attr=padding:string,allowed=["SAME", "VALID"]; attr=data_format:string,default="NHWC",allowed=["NHWC", "NCHW"]>; NodeDef: main_graph_0_encoder0/conv_1/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 4, 4, 1], use_cudnn_on_gpu=true](visual_observation_0, main_graph_0_encoder0/conv_1/kernel/read). (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
TensorFlow.TFStatus.CheckMaybeRaise (TensorFlow.TFStatus incomingStatus, System.Boolean last) (at <6ed6db22f8874deba74ffe3e566039be>:0)
TensorFlow.TFGraph.Import (TensorFlow.TFBuffer graphDef, TensorFlow.TFImportGraphDefOptions options, TensorFlow.TFStatus status) (at <6ed6db22f8874deba74ffe3e566039be>:0)
TensorFlow.TFGraph.Import (System.Byte[] buffer, TensorFlow.TFImportGraphDefOptions options, TensorFlow.TFStatus status) (at <6ed6db22f8874deba74ffe3e566039be>:0)
TensorFlow.TFGraph.Import (System.Byte[] buffer, System.String prefix, TensorFlow.TFStatus status) (at <6ed6db22f8874deba74ffe3e566039be>:0)
MLAgents.CoreBrainInternal.InitializeCoreBrain (MLAgents.Batcher brainBatcher) (at Assets/ML-Agents/Scripts/CoreBrainInternal.cs:132)
MLAgents.Brain.InitializeBrain (MLAgents.Academy aca, MLAgents.Batcher brainBatcher) (at Assets/ML-Agents/Scripts/Brain.cs:211)
MLAgents.Academy.InitializeEnvironment () (at Assets/ML-Agents/Scripts/Academy.cs:288)
MLAgents.Academy.Awake () (at Assets/ML-Agents/Scripts/Academy.cs:227)

This is only true with my bytes files. Bytes files provided with the examples work fine with TensorFlowSharp v0.3. So I get stuck to only look pre-trained files come with examples and cannot see the results of my own experiements.

My specs:
TensorFlow 1.7.1 (compiled myself without AVX)
Python 3.5.1
Windows 10 Family 64x
Laptop Asus X553MA
Intel Pentium CPU N3540
Unity 2018.2.0f2
ML Agents ToolKit v0.4b
TensorFlowSharp v0.4 and TensorFlowSharp v0.3
8Go of rams

Any news about that bug?

@Liven28 We are still not sure what is causing this bug, it works on our windows test machine so we are still not able to reproduce it.

@Pyroevil The error message you posted is saying the bytes file you generated is using a different tensorflow version than you place you are using it. If you want to try ml-agents v0.3, then you might want to switch all of them (including the tensorflowSharp plugin to v0.3, the tensorflow version to 1.4).

@jjjuande The problem you mentioned is a different issue, the TensorFlow.Android.dll file is showing the error message, but it is not the cause for the crash.

@m4Ssa So if you want to try ml-agents v0.3, switch all of them to v0.3, the communicator parameter error might be due to the protobuf file not compatible.

I had just reinstall windows10 Family 64x (with just windows and graphic drivers up to date, unity and ml-agent configured).
Same problem persist (crash on play internal brain).

Configuration : i7 920 / GTX 1060 / 12 Go Ram / Gigabyte GA-X58A-UD7
(no overclocking)

I tried on 2017.4 - 2018.1 - 2018.2 - 2018.3b
with Python 3.5.1, tensorflow 1.7.1, toolkit v0.4 / v0.5

Same problem here, everything works... except when I chose internal. It closes the screen as soon as I press play.

Using 2018.2.9f1 on Windows 10, CPU (not using GPU) Intel. Let me know if you want more info or testing,

@Liven28 @Gaby10 This is not something we can solve right now due to the reason I mentioned earlier. In v0.6 (which will be released in a few weeks)we will change the way internal brain works (It will be a scriptable object instead of a gameobject, and it will be called Learning Brain). If you guys want to try you can check this PR https://github.com/Unity-Technologies/ml-agents/pull/1250.

@xiaomaogy ok very interesting. I cross fingers.
If you need to test something, you know where I am.

same here.
Windows 10 Pro 64 bit, AMD Phenom(tm) || x4 965, Unity 2018.2.10f1
Training works fine, also in the editor. But if i click on play, with an internal brain, unity instantly closes.

Same, internal brains not working, on play unity closes instantly, while training works without a problem.

Specs:
Windows 10 Pro 64-bit (Build 17763)
Intel Pentium G4620
Unity 2018.2.13f1

@xiaomaogy
When i try to use the older TFSharp version im getting an error in Unity:
Assets/ML-Agents/Scripts/RpcCommunicator.cs(23,9): error CS0246: The type or namespace name CommunicatorParameters' could not be found. Are you missing an assembly reference?
I've build my Tensorflow without Grpc support since it didnt work with Grpc enabled. Don't know if thats a related problem though.

Hello, the same issue, did you solve it, how?

@xiaomaogy @beardordie @m4Ssa Hi guys, i think i found solve of the issue. I clone repository into new clear diretory, done all things by instruction of v0.5, but imported TensorFlowSharp for ml-agents v0.3 from here https://github.com/TimothyA86/ml-agents/blob/master/docs/Installation.md.

My laptop has AMD A8-3520m processor, which without AVX support. Build of @Setmaster doesn`t works.

My laptop has AMD A8-3520m processor, which without AVX support. Build of @Setmaster doesn`t works.

My CPU doesn't support AVX too, I will need to try you way.

@xiaomaogy
Hello, my i7 920 doesn't support AVX, I tried kudyk solution (ml_agent v0.5 / TensorFlowSharp from v0.3 ml_agent) et it seems to work!

I just tried 3Dball scene with internal brain and Unity didn't crash.
I didn't make more tests, waiting for official news about the viability of this solution.

kudyk solution works, but most examples throws long exceptions, one of them:

TFException: NodeDef mentions attr 'output_dtype' not in Op<name=Multinomial; signature=logits:T, num_samples:int32 -> output:int64; attr=seed:int,default=0; attr=seed2:int,default=0; attr=T:type,allowed=[DT_FLOAT, DT_DOUBLE, DT_INT32, DT_INT64, DT_UINT8, DT_INT16, DT_INT8, DT_UINT16, DT_HALF]; is_stateful=true>; NodeDef: multinomial_3/Multinomial = Multinomial[T=DT_FLOAT, output_dtype=DT_INT64, seed=670408, seed2=108](dense_3/MatMul, multinomial_3/Multinomial/num_samples). (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).

Working examples: 3D Ball, Bouncer, Crawler, Reacher, Tennis and Walker.

kudyk solution works for me too

is the v0.7 solve the problem?

Hi everyone, v0.7 works with tensorflow 1.7.0 from here.

Does it meen that tensorflow 1.7.0 now worsk with non AVX CPUs (and we don't need to use 1.4.0 any more)
or that v0.7 doesn't work with 1.4.0 any more and non AVX CPUs can't use lm-agent ?

@Liven28 Tensorflow that i use form the link above is third-party, unofficial, built with sse2 support by @fo40225 user. He have build of 1.7.1 too, but only for cuda gpu, and it doesn`t work for me.

Since we've switched from TensorFlowSharp to Barracuda, this issue is no longer relevant. I will close it for now. Feel free to open if you want to discuss more.

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

MarcPilgaard picture MarcPilgaard  Â·  3Comments

MarkTension picture MarkTension  Â·  3Comments

jlanis picture jlanis  Â·  4Comments

GeriBP picture GeriBP  Â·  3Comments

RavenLeeANU picture RavenLeeANU  Â·  4Comments