File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 93, in __init__
Trainer.__init__(self, config, env, logger_creator)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 455, in __init__
super().__init__(config, logger_creator)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/tune/trainable.py", line 174, in __init__
self._setup(copy.deepcopy(self.config))
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 596, in _setup
self._init(self.config, self.env_creator)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 117, in _init
config)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/agents/dqn/apex.py", line 48, in defer_make_workers
return trainer._make_workers(env_creator, policy, config, 0)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 666, in _make_workers
logdir=self.logdir)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 61, in __init__
RolloutWorker, env_creator, policy, 0, self._local_config)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 271, in _make_worker
_fake_sampler=config.get("_fake_sampler", False))
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 360, in __init__
self._build_policy_map(policy_dict, policy_config)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 836, in _build_policy_map
policy_map[name] = cls(obs_space, act_space, merged_conf)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/agents/ddpg/ddpg_policy.py", line 110, in __init__
self.cur_observations, observation_space, action_space)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/agents/ddpg/ddpg_policy.py", line 410, in _build_policy_network
import tensorflow.contrib.layers as layers
ModuleNotFoundError: No module named 'tensorflow.contrib'
Ray version and other system information (Python version, TensorFlow version, OS):
Python 3.7.7
I am running Ray 0.8.3 but this bug is currently still in the master branch
tensorflow==2.1.0
Setting any DDPG algorithm to parameter_noise=True forces this import to run
https://github.com/ray-project/ray/blob/master/rllib/agents/ddpg/ddpg_policy.py#L417
Tensorflow 2 does not have a contrib module
Yeah, we should switch to keras here. I'm taking a look.
For now, could you replace the code (~ line 416) in rllib/agents/ddpg/ddpg_policy.py with the following? That should eliminate the dependency on the deprecated tf.contrib.
activation = getattr(tf.nn, self.config["actor_hidden_activation"])
for hidden in self.config["actor_hiddens"]:
if self.config["parameter_noise"]:
action_out = tf.keras.layers.Dense(
units=hidden,
activation=activation)(action_out)
action_out = tf.keras.layers.LayerNormalization()(action_out)
else:
action_out = tf.keras.layers.Dense(
units=hidden, activation=activation)(action_out)
action_out = tf.keras.layers.Dense(
units=action_space.shape[0], activation=None)(action_out)
Cool. I will give this a try!
Preliminary tests look ok. Please let me know, if this doesn't help.
Sorry it takes me ages to compile the entire of Ray when I make a change like this.
This is the error I get with the new code changed
2020-04-01 11:22:49,761 ERROR trial_runner.py:521 -- Trial APEX_DDPG_LunarLanderContinuous-v2_00005: Error processing event.
Traceback (most recent call last):
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 467, in _process_trial
result = self.trial_executor.fetch_result(trial)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 381, in fetch_result
result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/worker.py", line 1513, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::APEX_DDPG.__init__() (pid=97720, ip=10.30.30.100)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1619, in _create_c_op
c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimensions must be equal, but are 400 and 300 for 'default_policy/add_14' (op: 'AddV2') with input shapes: [400], [300].
During handling of the above exception, another exception occurred:
ray::APEX_DDPG.__init__() (pid=97720, ip=10.30.30.100)
File "python/ray/_raylet.pyx", line 414, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 449, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 450, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 452, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 407, in ray._raylet.execute_task.function_executor
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 90, in __init__
Trainer.__init__(self, config, env, logger_creator)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 455, in __init__
super().__init__(config, logger_creator)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/tune/trainable.py", line 174, in __init__
self._setup(copy.deepcopy(self.config))
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 596, in _setup
self._init(self.config, self.env_creator)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 113, in _init
config)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/agents/dqn/apex.py", line 48, in defer_make_workers
return trainer._make_workers(env_creator, policy, config, 0)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 667, in _make_workers
logdir=self.logdir)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 62, in __init__
RolloutWorker, env_creator, policy, 0, self._local_config)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 272, in _make_worker
_fake_sampler=config.get("_fake_sampler", False))
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 360, in __init__
self._build_policy_map(policy_dict, policy_config)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 842, in _build_policy_map
policy_map[name] = cls(obs_space, act_space, merged_conf)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/agents/ddpg/ddpg_policy.py", line 262, in __init__
(1.0 - self.tau) * var_target))
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/tensorflow_core/python/ops/math_ops.py", line 902, in binary_op_wrapper
return func(x, y, name=name)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/tensorflow_core/python/ops/math_ops.py", line 1194, in _add_dispatch
return gen_math_ops.add_v2(x, y, name=name)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 483, in add_v2
"AddV2", x=x, y=y, name=name)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 742, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3322, in _create_op_internal
op_def=op_def)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1786, in __init__
control_input_ops)
File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1622, in _create_c_op
raise ValueError(str(e))
ValueError: Dimensions must be equal, but are 400 and 300 for 'default_policy/add_14' (op: 'AddV2') with input shapes: [400], [300].
OK I think that error is actually caused by a problem in my code. I am compiling now from your branch in the PR you made @sven1977. Will report back
Ok I compiled from https://github.com/sven1977/ray/tree/issue_7850_ddpg_tf_contrib_error
And now I get a new error
ray::RolloutWorker.sample_with_count() (pid=101749, ip=10.30.30.100)
(pid=101732) File "python/ray/_raylet.pyx", line 449, in ray._raylet.execute_task
(pid=101732) File "python/ray/_raylet.pyx", line 450, in ray._raylet.execute_task
(pid=101732) File "python/ray/_raylet.pyx", line 452, in ray._raylet.execute_task
(pid=101732) File "python/ray/_raylet.pyx", line 407, in ray._raylet.execute_task.function_executor
(pid=101732) File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 539, in sample_with_count
(pid=101732) batch = self.sample()
(pid=101732) File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 492, in sample
(pid=101732) batches = [self.input_reader.next()]
(pid=101732) File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 53, in next
(pid=101732) batches = [self.get_data()]
(pid=101732) File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 96, in get_data
(pid=101732) item = next(self.rollout_provider)
(pid=101732) File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 346, in _env_runner
(pid=101732) callbacks, soft_horizon, no_done_at_end)
(pid=101732) File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 495, in _process_observations
(pid=101732) outputs.append(episode.batch_builder.build_and_reset(episode))
(pid=101732) File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/evaluation/sample_batch_builder.py", line 202, in build_and_reset
(pid=101732) self.postprocess_batch_so_far(episode)
(pid=101732) File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/evaluation/sample_batch_builder.py", line 153, in postprocess_batch_so_far
(pid=101732) pre_batch, other_batches, episode)
(pid=101732) File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/agents/ddpg/ddpg_policy.py", line 56, in postprocess_trajectory
(pid=101732) self._is_exploring: False,
(pid=101732) File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 960, in run
(pid=101732) run_metadata_ptr)
(pid=101732) File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1183, in _run
(pid=101732) feed_dict_tensor, options, run_metadata)
(pid=101732) File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1361, in _do_run
(pid=101732) run_metadata)
(pid=101732) File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1386, in _do_call
(pid=101732) raise type(e)(node_def, op, message)
(pid=101732) tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'default_policy/timestep' with dtype int32
(pid=101732) [[node default_policy/timestep (defined at /ray/rllib/agents/ddpg/ddpg_policy.py:96) ]]
(pid=101732)
(pid=101732) Original stack trace for 'default_policy/timestep':
(pid=101732) File "/ray/workers/default_worker.py", line 113, in <module>
(pid=101732) ray.worker.global_worker.main_loop()
(pid=101732) File "/ray/rllib/evaluation/rollout_worker.py", line 360, in __init__
(pid=101732) self._build_policy_map(policy_dict, policy_config)
(pid=101732) File "/ray/rllib/evaluation/rollout_worker.py", line 842, in _build_policy_map
(pid=101732) policy_map[name] = cls(obs_space, act_space, merged_conf)
(pid=101732) File "/ray/rllib/agents/ddpg/ddpg_policy.py", line 96, in __init__
(pid=101732) timestep = tf.placeholder(tf.int32, (), name="timestep")
(pid=101732) File "/tensorflow_core/python/ops/array_ops.py", line 2718, in placeholder
(pid=101732) return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name)
(pid=101732) File "/tensorflow_core/python/ops/gen_array_ops.py", line 6032, in placeholder
(pid=101732) "Placeholder", dtype=dtype, shape=shape, name=name)
(pid=101732) File "/tensorflow_core/python/framework/op_def_library.py", line 742, in _apply_op_helper
(pid=101732) attrs=attr_protos, op_def=op_def)
(pid=101732) File "/tensorflow_core/python/framework/ops.py", line 3322, in _create_op_internal
(pid=101732) op_def=op_def)
(pid=101732) File "/tensorflow_core/python/framework/ops.py", line 1756, in __init__
(pid=101732) self._traceback = tf_stack.extract_stack()
(pid=101732)
(pid=101732) During handling of the above exception, another exception occurred:
(pid=101732)
(pid=101732) Traceback (most recent call last):
(pid=101732) File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 491, in train
(pid=101732) result = Trainable.train(self)
(pid=101732) File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/tune/trainable.py", line 261, in train
(pid=101732) result = self._train()
(pid=101732) File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 150, in _train
(pid=101732) fetches = self.optimizer.step()
(pid=101732) File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/optimizers/async_replay_optimizer.py", line 143, in step
(pid=101732) sample_timesteps, train_timesteps = self._step()
(pid=101732) File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/optimizers/async_replay_optimizer.py", line 265, in _step
(pid=101732) raise ray_error
(pid=101732) File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/optimizers/async_replay_optimizer.py", line 229, in _step
(pid=101732) counts[i] = ray_get_and_free(c[1][1])
(pid=101732) File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/rllib/utils/memory.py", line 29, in ray_get_and_free
(pid=101732) result = ray.get(object_ids)
(pid=101732) File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/ray/worker.py", line 1513, in get
(pid=101732) raise value.as_instanceof_cause()
(pid=101732) ray.exceptions.RayTaskError(InvalidArgumentError): ray::RolloutWorker.sample_with_count() (pid=101749, ip=10.30.30.100)
(pid=101732) File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1352, in _run_fn
(pid=101732) target_list, run_metadata)
(pid=101732) File "/home/axion/anaconda3/envs/training/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1445, in _call_tf_sessionrun
(pid=101732) run_metadata)
(pid=101732) tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'default_policy/timestep' with dtype int32
(pid=101732) [[{{node default_policy/timestep}}]]
Taking another look ...
Got it ...
Sorry about the trouble:
Could you try the same PR (I just updated, you just have to git pull)? Or just add this line to ddpg_policy.py:
clean_actions, cur_noise_scale = self.sess.run(
[self.output_actions,
self.exploration.get_info()],
feed_dict={
self.cur_observations: states,
self._is_exploring: False,
self._timestep: self.global_timestep, # <- new line here
})
To explain the situation: We are just cleaning up all of the existing ParameterNoise code and are still finding bugs in it here and there. We have started moving this into the new Exploration API (and are adding tests and tuned examples), but it's not fully merged yet for neither DQN, nor DDPG. This will probably happen next week and then, param-noise will also be available for other algos.