I'm trying to train my own agent in an environment and getting this error , not sure what I'm misconfiguring in my scene for this to happen
`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
42 if len(trainer.training_buffer['actions']) > buffer_size and train_model:
43 # Perform gradient descent with experience buffer
---> 44 trainer.update_model(batch_size, num_epoch)
45 if steps % summary_freq == 0 and steps != 0 and train_model:
46 # Write training statistics to tensorboard.
/Users/sterlingcrispin/code/Unity-ML/python/ppo/trainer.pyc in update_model(self, batch_size, num_epoch)
139 if self.is_continuous:
140 feed_dict[self.model.epsilon] = np.vstack(training_buffer['epsilons'][start:end])
--> 141 feed_dict[self.model.state_in] = np.vstack(training_buffer['states'][start:end])
142 else:
143 feed_dict[self.model.action_holder] = np.hstack(training_buffer['actions'][start:end])
/Users/sterlingcrispin/anaconda/lib/python2.7/site-packages/numpy/core/shape_base.pyc in vstack(tup)
235
236 """
--> 237 return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
238
239 def hstack(tup):
ValueError: need at least one array to concatenate`
From briefly reading the docs (I'm setting up my own env as well) I believe the issue may be lack of floats that it wants to pipe into numpy from your agent to keep track of 'memories' and such. Can you post your code?
I'm not totally sure I understand the MemorySize argument in the Brain, do you define it in the same way you do with the State and Actions ? If so where? I had it set to the same number of states + actions I have but I just set it to zero and I still got the same error.
this is my agent code right now, it's a bunch of rigid bodies and hinge joints
I want to say there is a reason they dont iterate through acts in these demo builds. Does it work this way? Your code now, sets four actions within the same step. Whereas in the demos, they have if statements for the current steps as static array points i.e act[1]{movement} act[2]{movement}
If it works with your type of array then great, I'd generally like to know :D
if you look at the Ball3DAgent code they've got 2 total actions and are performing act[0] and act[1] as action_z action_x every step
my code exerts a force on all 24 rigid bodies using all 24 actions every step, it's a little unclear but there's 6 legs with 4 rigid bodies, this logic I was using should access them all one by one as if I was manually typing out
act[i * legs[i].rbcount + 0]
act[i * legs[i].rbcount + 1]
act[i * legs[i].rbcount + 2]
act[i * legs[i].rbcount + 3]
Hi @sterlingcrispin,
As a debug step, I would suggest walking through the Basic.ipynb notebook using your new environment. You can use it to examine the state and action space in an interactive manner, which might give you a better intuition into how your states are being represented.
The error in ppo.py you are receiving seems to correspond to the training_buffer being empty. Can I ask what you've set your Academy or Agent's Max Steps to?
@awjuliani the Basic.ipynb looks okay, one thing I'm noticing is that I might be punishing my agent too much? Im only subtracting -0.01 each step it doesn't get closer
Total reward this episode: 49.17
Total reward this episode: -10.01
Total reward this episode: -10.01
Total reward this episode: 43.01
Total reward this episode: -10.01
Total reward this episode: -10.01
Total reward this episode: -10.01
Total reward this episode: 0.88
Total reward this episode: -10.01
Total reward this episode: -10.01
I've tried setting the agent's max step to 100 (the default) and Academy to 0 and 1000
then tried agent max step to 1000 and Academy at 0
My agent currently doesn't have a condition where done = true and a negative reward is given, could that be causing it?
What is typically done for locomotion tasks like this is actually not to give a negative reward, but to give a positive reward for any progress the agent makes. So the reward function might look something like: +0.01 for every 1/10th of a meter it makes in forward-progress. If you expected it to walk upright, you might also provide a -0.1 reward for when it falls over, and end the episode (done = true) then. For some inspiration, I would recommend having a look through this recent DeepMind paper where they train agents to walk (and run, and jump): https://arxiv.org/abs/1707.02286.
That definitely isn't intuitive though, and I will go through and add this to a "best practices" section of the wiki.
I've made some improvements on the rewards but that error about the training buffer still being empty remains
do I need to change some of the Hyperparameters for the PPO or are my unity inspector settings misconfigured?



Just realized what you issue is @sterlingcrispin. You are using a camera as an observation with your agent. The model being created by PPO is one that expects to take the observations, not the states, thus the empty states array when training. There are currently three model configurations included with ppo:
I am definitely open to adding more to accommodate additional configurations, but the complexity/experimental nature of the network increases when trying to do things like combine visual input with states. As a quick fix, I'd recommend taking away the camera and seeing what you can learn just from the state inputs. If that isn't enough, you can augment the state with information about what is in front of you with Raycasts.
@awjuliani okay I must have overlooked that somehow, seeing it written out like that makes it a lot clearer.
I think Continuous control + camera observation would be really cool ,
I took away the camera and it seems to be training okay except for errors on my part with a few NAN rewards when the agent fell over but the fall wasn't detected
thank you!
Glad it is working out!
For now, I will add an error when trying to use PPO with an unsupported agent configurations. Over time I'd like to add more network architectures to fit whatever type of agent is built, but that will take some experimentation, as things like continuous control + camera input is actually still an active area of research.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
Just realized what you issue is @sterlingcrispin. You are using a camera as an observation with your agent. The model being created by PPO is one that expects to take the observations, not the states, thus the empty
statesarray when training. There are currently three model configurations included with ppo:I am definitely open to adding more to accommodate additional configurations, but the complexity/experimental nature of the network increases when trying to do things like combine visual input with states. As a quick fix, I'd recommend taking away the camera and seeing what you can learn just from the state inputs. If that isn't enough, you can augment the state with information about what is in front of you with Raycasts.