Stable-baselines: Is VecNormalize intentionally not changing the obervation space dtype?

Created on 6 Feb 2020  路  9Comments  路  Source: hill-a/stable-baselines

Pretty much the title: I found that VecNormalize does not change the data type of the observation space.

If you have an env which returns uint8 observations and wrap that in a VecNormalize and a VecFrameStack you will end up with uint8 observations instead of the float obs from the normalization (cause VecFrameStack converts them back).

This might acually work in some cases since VecNormalize does normalize between 2 arbritrary values, but for me it caused some confusion. Maybe the output dtype can be a parameter for VecNormalize?

enhancement

Most helpful comment

It's here and there

All 9 comments

Do your observations happen to be images? I believe VecNormalize is designed for robotics environments, where observations variables can range between unknown intervals (in which case a running mean/std, like done by VecNormalize, is required). If you know your values range between [0, 255], then you can divide it by 255 and things should work all dandy.

In any case I agree, the output of VecNormalize should be a float/double, in case somebody uses e.g. int32s. @AdamGleave any comments?

Yep they are images. I used VecNormalize in the beginning when i switched from another environment and then wondered why my fps were going down when I used the ScaledFloatFrame wrapper instead (which just divides everything by 255.0).

But its funny that he did learn anything even with each observation squished down to values from 1 to 10 (in int).

In any case I agree, the output of VecNormalize should be a float/double, in case somebody uses e.g. int32s. @AdamGleave any comments?

Is there a reasonable use case for applying VecNormalize to int data types? It feels odd to try to normalize discrete data, although if the discrete data has a large enough range perhaps it makes sense.

If we don't intend VecNormalize to be used in this case, I'd be leaning towards adding input validation that raises an error if the venv being wrapped does not have a float-type observation_space.

Yep they are images

If you use CnnPolicies, images are normalized automatically (cf doc).

I think there is a use case where you use normalization on image data. For example OpenAI used it in their large scale study on curiosity learning. I think the difference between the learned normalization and just dividing everything by 255 is that pixels which never change are actually set to 0 instead of something positive. I think that results in a better focus when learning, but i can't really proof that at the moment, its just a guess.

@araffin Oh I didn't know that. I now see it in the code, but where is it in the doc? At least not under CNN Policies

EDIT: Formatting issues

It's here and there

Oh okay...I must have been blind there, thanks.

Yep they are images

If you use CnnPolicies, images are normalized automatically (cf doc).

Hi, @araffin @NeoExtended ,

Where can we find the automatically images normalization operation in the code?

@Jiankai-Sun

Here.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sahilgupta2105 picture sahilgupta2105  路  3Comments

JankyOo picture JankyOo  路  3Comments

acyclics picture acyclics  路  3Comments

stefanbschneider picture stefanbschneider  路  3Comments

RyanRizzo96 picture RyanRizzo96  路  3Comments