Hey guys,
I switched from training with the CPU to training with the GPU. Now I found some strange result in my training, see in the following:
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 50000. Time Elapsed: 266.213 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 50000. Time Elapsed: 162.361 s Mean Reward: 0.250. Std of Reward: 0.433. Training.
Now I ask myself, how can it be that CPU is faster than GPU?
Completely logfiles below:
GPU - Log
INFO:mlagents.envs:Start training by pressing the Play button in the Unity Editor.
INFO:mlagents.envs:
'Academy' started successfully!
Unity Academy name: Academy
Number of Training Brains : 0
Reset Parameters :
2019-12-01 16:37:07.519416: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-12-01 16:37:07.719684: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1344] Found device 0 with properties:
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.7085
pciBusID: 0000:23:00.0
totalMemory: 8.00GiB freeMemory: 6.63GiB
2019-12-01 16:37:07.727134: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1423] Adding visible gpu devices: 0
2019-12-01 16:37:08.304646: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-01 16:37:08.308769: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:917] 0
2019-12-01 16:37:08.311513: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:930] 0: N
2019-12-01 16:37:08.315036: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6412 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:23:00.0, compute capability: 6.1)
INFO:mlagents.envs:Hyperparameters for the PPOTrainer of brain FindTheTarget:
trainer: ppo
batch_size: 1024
beta: 0.005
buffer_size: 10240
epsilon: 0.2
hidden_units: 128
lambd: 0.95
learning_rate: 0.0003
learning_rate_schedule: linear
max_steps: 5.0e4
memory_size: 256
normalize: False
num_epoch: 3
num_layers: 2
time_horizon: 64
sequence_length: 64
summary_freq: 1000
use_recurrent: False
vis_encode_type: simple
reward_signals:
extrinsic:
strength: 1.0
gamma: 0.99
summary_path: ./summaries/gpu_test_FindTheTarget
model_path: ./models/gpu_test-0/FindTheTarget
keep_checkpoints: 5
2019-12-01 16:37:09.163324: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1423] Adding visible gpu devices: 0
2019-12-01 16:37:09.166858: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-01 16:37:09.171042: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:917] 0
2019-12-01 16:37:09.174282: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:930] 0: N
2019-12-01 16:37:09.177043: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6412 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:23:00.0, compute capability: 6.1)
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 1000. Time Elapsed: 7.942 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 2000. Time Elapsed: 13.018 s Mean Reward: 0.333. Std of Reward: 0.471. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 3000. Time Elapsed: 18.133 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 4000. Time Elapsed: 23.281 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 5000. Time Elapsed: 28.371 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 6000. Time Elapsed: 33.516 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 7000. Time Elapsed: 38.734 s Mean Reward: 0.500. Std of Reward: 0.500. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 8000. Time Elapsed: 43.881 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 9000. Time Elapsed: 49.087 s Mean Reward: 0.250. Std of Reward: 0.433. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 10000. Time Elapsed: 54.253 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 11000. Time Elapsed: 60.478 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 12000. Time Elapsed: 65.565 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 13000. Time Elapsed: 70.635 s Mean Reward: 0.250. Std of Reward: 0.433. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 14000. Time Elapsed: 75.746 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 15000. Time Elapsed: 80.900 s Mean Reward: 0.333. Std of Reward: 0.471. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 16000. Time Elapsed: 85.973 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 17000. Time Elapsed: 91.108 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 18000. Time Elapsed: 96.206 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 19000. Time Elapsed: 101.300 s Mean Reward: 0.333. Std of Reward: 0.471. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 20000. Time Elapsed: 106.455 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 21000. Time Elapsed: 112.635 s Mean Reward: 0.250. Std of Reward: 0.433. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 22000. Time Elapsed: 117.752 s Mean Reward: 0.250. Std of Reward: 0.433. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 23000. Time Elapsed: 122.865 s Mean Reward: 0.333. Std of Reward: 0.471. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 24000. Time Elapsed: 128.188 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 25000. Time Elapsed: 133.507 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 26000. Time Elapsed: 139.046 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 27000. Time Elapsed: 144.398 s Mean Reward: 0.333. Std of Reward: 0.471. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 28000. Time Elapsed: 149.628 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 29000. Time Elapsed: 154.722 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 30000. Time Elapsed: 160.108 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 31000. Time Elapsed: 166.550 s Mean Reward: 0.667. Std of Reward: 0.471. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 32000. Time Elapsed: 171.803 s Mean Reward: 0.200. Std of Reward: 0.400. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 33000. Time Elapsed: 176.988 s Mean Reward: 0.333. Std of Reward: 0.471. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 34000. Time Elapsed: 182.352 s Mean Reward: 0.500. Std of Reward: 0.500. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 35000. Time Elapsed: 187.643 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 36000. Time Elapsed: 192.973 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 37000. Time Elapsed: 198.213 s Mean Reward: 0.333. Std of Reward: 0.471. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 38000. Time Elapsed: 203.350 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 39000. Time Elapsed: 208.470 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 40000. Time Elapsed: 213.546 s Mean Reward: 0.500. Std of Reward: 0.500. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 41000. Time Elapsed: 218.683 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 42000. Time Elapsed: 224.978 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 43000. Time Elapsed: 230.144 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 44000. Time Elapsed: 235.289 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 45000. Time Elapsed: 240.423 s Mean Reward: 0.250. Std of Reward: 0.433. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 46000. Time Elapsed: 245.592 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 47000. Time Elapsed: 250.751 s Mean Reward: 0.400. Std of Reward: 0.490. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 48000. Time Elapsed: 255.842 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 49000. Time Elapsed: 260.910 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.envs:Saved Model
INFO:mlagents.trainers: gpu_test: FindTheTarget: Step: 50000. Time Elapsed: 266.213 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.envs:Saved Model
INFO:mlagents.trainers:List of nodes to export for brain :FindTheTarget
INFO:mlagents.trainers: is_continuous_control
INFO:mlagents.trainers: version_number
INFO:mlagents.trainers: memory_size
INFO:mlagents.trainers: action_output_shape
INFO:mlagents.trainers: action
INFO:mlagents.trainers: action_probs
INFO:tensorflow:Froze 11 variables.
INFO:tensorflow:Froze 11 variables.
Converted 11 variables to const ops.
Converting ./models/gpu_test-0/FindTheTarget/frozen_graph_def.pb to ./models/gpu_test-0/FindTheTarget.nn
IGNORED: StopGradient unknown layer
GLOBALS: 'is_continuous_control', 'version_number', 'memory_size', 'action_output_shape'
IN: 'vector_observation': [-1, 1, 1, 8] => 'main_graph_0/hidden_0/BiasAdd'
IN: 'epsilon': [-1, 1, 1, 2] => 'mul'
OUT: 'action', 'action_probs'
DONE: wrote ./models/gpu_test-0/FindTheTarget.nn file.
INFO:mlagents.trainers:Exported ./models/gpu_test-0/FindTheTarget.nn file
CPU - Log
INFO:mlagents.envs:Start training by pressing the Play button in the Unity Editor.
INFO:mlagents.envs:
'Academy' started successfully!
Unity Academy name: Academy
Number of Training Brains : 0
Reset Parameters :
2019-12-01 16:44:32.741259: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
INFO:mlagents.envs:Hyperparameters for the PPOTrainer of brain FindTheTarget:
trainer: ppo
batch_size: 1024
beta: 0.005
buffer_size: 10240
epsilon: 0.2
hidden_units: 128
lambd: 0.95
learning_rate: 0.0003
learning_rate_schedule: linear
max_steps: 5.0e4
memory_size: 256
normalize: False
num_epoch: 3
num_layers: 2
time_horizon: 64
sequence_length: 64
summary_freq: 1000
use_recurrent: False
vis_encode_type: simple
reward_signals:
extrinsic:
strength: 1.0
gamma: 0.99
summary_path: ./summaries/cpu_test_FindTheTarget
model_path: ./models/cpu_test-0/FindTheTarget
keep_checkpoints: 5
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 1000. Time Elapsed: 4.904 s Mean Reward: 0.333. Std of Reward: 0.471. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 2000. Time Elapsed: 8.177 s Mean Reward: 0.500. Std of Reward: 0.500. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 3000. Time Elapsed: 11.318 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 4000. Time Elapsed: 14.359 s Mean Reward: 0.333. Std of Reward: 0.471. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 5000. Time Elapsed: 17.472 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 6000. Time Elapsed: 20.622 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 7000. Time Elapsed: 23.727 s Mean Reward: 0.333. Std of Reward: 0.471. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 8000. Time Elapsed: 26.883 s Mean Reward: 0.250. Std of Reward: 0.433. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 9000. Time Elapsed: 29.884 s Mean Reward: 0.500. Std of Reward: 0.500. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 10000. Time Elapsed: 33.035 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 11000. Time Elapsed: 37.214 s Mean Reward: 0.333. Std of Reward: 0.471. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 12000. Time Elapsed: 40.355 s Mean Reward: 0.667. Std of Reward: 0.471. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 13000. Time Elapsed: 43.465 s Mean Reward: 0.333. Std of Reward: 0.471. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 14000. Time Elapsed: 46.470 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 15000. Time Elapsed: 49.675 s Mean Reward: 0.333. Std of Reward: 0.471. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 16000. Time Elapsed: 52.919 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 17000. Time Elapsed: 56.278 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 18000. Time Elapsed: 59.490 s Mean Reward: 0.250. Std of Reward: 0.433. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 19000. Time Elapsed: 62.679 s Mean Reward: 0.333. Std of Reward: 0.471. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 20000. Time Elapsed: 65.774 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 21000. Time Elapsed: 69.987 s Mean Reward: 0.333. Std of Reward: 0.471. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 22000. Time Elapsed: 73.090 s Mean Reward: 0.400. Std of Reward: 0.490. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 23000. Time Elapsed: 76.336 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 24000. Time Elapsed: 79.441 s Mean Reward: 0.500. Std of Reward: 0.500. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 25000. Time Elapsed: 82.554 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 26000. Time Elapsed: 85.674 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 27000. Time Elapsed: 88.820 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 28000. Time Elapsed: 91.934 s Mean Reward: 0.333. Std of Reward: 0.471. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 29000. Time Elapsed: 95.006 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 30000. Time Elapsed: 98.112 s Mean Reward: 0.250. Std of Reward: 0.433. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 31000. Time Elapsed: 102.335 s Mean Reward: 0.250. Std of Reward: 0.433. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 32000. Time Elapsed: 105.439 s Mean Reward: 0.250. Std of Reward: 0.433. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 33000. Time Elapsed: 108.572 s Mean Reward: 0.250. Std of Reward: 0.433. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 34000. Time Elapsed: 111.682 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 35000. Time Elapsed: 114.797 s Mean Reward: 0.400. Std of Reward: 0.490. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 36000. Time Elapsed: 117.891 s Mean Reward: 0.250. Std of Reward: 0.433. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 37000. Time Elapsed: 121.139 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 38000. Time Elapsed: 124.319 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 39000. Time Elapsed: 127.502 s Mean Reward: 0.250. Std of Reward: 0.433. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 40000. Time Elapsed: 130.646 s Mean Reward: 0.250. Std of Reward: 0.433. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 41000. Time Elapsed: 133.764 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 42000. Time Elapsed: 137.879 s Mean Reward: 0.400. Std of Reward: 0.490. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 43000. Time Elapsed: 140.935 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 44000. Time Elapsed: 143.938 s Mean Reward: 0.400. Std of Reward: 0.490. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 45000. Time Elapsed: 146.975 s Mean Reward: 0.250. Std of Reward: 0.433. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 46000. Time Elapsed: 150.020 s Mean Reward: 0.000. Std of Reward: 0.000. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 47000. Time Elapsed: 153.022 s Mean Reward: 0.400. Std of Reward: 0.490. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 48000. Time Elapsed: 156.105 s Mean Reward: 0.250. Std of Reward: 0.433. Training.
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 49000. Time Elapsed: 159.179 s Mean Reward: 0.200. Std of Reward: 0.400. Training.
INFO:mlagents.envs:Saved Model
INFO:mlagents.trainers: cpu_test: FindTheTarget: Step: 50000. Time Elapsed: 162.361 s Mean Reward: 0.250. Std of Reward: 0.433. Training.
INFO:mlagents.envs:Saved Model
INFO:mlagents.trainers:List of nodes to export for brain :FindTheTarget
INFO:mlagents.trainers: is_continuous_control
INFO:mlagents.trainers: version_number
INFO:mlagents.trainers: memory_size
INFO:mlagents.trainers: action_output_shape
INFO:mlagents.trainers: action
INFO:mlagents.trainers: action_probs
INFO:tensorflow:Froze 11 variables.
INFO:tensorflow:Froze 11 variables.
Converted 11 variables to const ops.
Converting ./models/cpu_test-0/FindTheTarget/frozen_graph_def.pb to ./models/cpu_test-0/FindTheTarget.nn
IGNORED: StopGradient unknown layer
GLOBALS: 'is_continuous_control', 'version_number', 'memory_size', 'action_output_shape'
IN: 'vector_observation': [-1, 1, 1, 8] => 'main_graph_0/hidden_0/BiasAdd'
IN: 'epsilon': [-1, 1, 1, 2] => 'mul'
OUT: 'action', 'action_probs'
DONE: wrote ./models/cpu_test-0/FindTheTarget.nn file.
INFO:mlagents.trainers:Exported ./models/cpu_test-0/FindTheTarget.nn file
Best regards,
Markus
GPU's are faster for big amout of data, be it from big networks with convolutional layers and visual obsevations and/or large batch sizes.
If you are using small networks with a few vector observations and/or small batch sizes, CPU is usually better.
Most helpful comment
GPU's are faster for big amout of data, be it from big networks with convolutional layers and visual obsevations and/or large batch sizes.
If you are using small networks with a few vector observations and/or small batch sizes, CPU is usually better.