Ray: How to handle Ordinal input and output?

Created on 16 Apr 2019  Â·  13Comments  Â·  Source: ray-project/ray

The observation space is ordinal, like 5, 10, 18, 23, 31 and the action is to choose any 3 from the input, like, 10, 18, 31,
How best to represent the above observation and action space?
I cannot make it a continuous space because, the action MUST be a subset of the observation space. Continuous action space can output ANY number and not just from the observation space.

I cannot make it discrete because, the input numbers can be anything. In one sample it could be 5, 10, 18, 23, 31. In the next sample it could be 7, 10, 15, 20, 51 etc.

To summarize, the input is a set of numbers and the output must be a subset of those numbers.

question

All 13 comments

That's exactly what parametric actions can do, check it out: https://ray.readthedocs.io/en/latest/rllib-models.html#variable-length-parametric-action-spaces

Thanks. It helps a lot.
But, how do I make sure, the actions are a subset of the observation? For example, the observation can have any 5 numbers. The output should be a max of any 3 numbers out of those 5.

You can use an action mask as in the parametric actions example -- the mask
can be based on arbitrary parts of the input observation.

On Tue, Apr 16, 2019, 2:35 AM Gowtham Natarajan notifications@github.com
wrote:

Thanks. It helps a lot.
But, how do I make sure, the actions are a subset of the observation? For
example, the observation can have any 5 numbers. The output should be a max
of any 3 numbers out of those 5.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/ray-project/ray/issues/4639#issuecomment-483530010,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAA6ShrD5TWN8mTpdCohWwpFoZmSgGbAks5vhW8WgaJpZM4cw-pI
.

Any example with variable number of actions as output?

You could try MultiDiscrete() or Tuple(list of action spaces) for that. For example, the first action of the tuple could be Discrete(n) that determines the number of actions to take, which are the rest of the tuple elements.

Tuple([
   Discrete(4),  # number of actions to take, from 1-4,
   action_space1,
   action_space2,
   action_space3,
   action_space4,
])

Thanks. This would avoid an RNN.
I also have variable number of features in the input (observation space). I am planning to create enough dimentions to store maximum possible number of features and use padding with zeros when the input size is less than the max possible number of features.
Is that a good practice? Can something better be done other than an RNN?

Padding with zeros sounds fine. If the order of the features doesn't matter, you could also consider network elements like attention or pooling to make the processing order-agnostic. RNN could also work, I would just try all of them and see which works best.

The input is ordinal. That is, they are all numbers and can be sorted. And euclidean distance applies to the input. But the range of numbers can be different: Example: It could be any of the following:

1000, 1010, 1020, 1030, 1050
2200, 2205, 2210, 2215, 2240, 2250, 2260, 2300, 2400

So the numbers are in increasing order, and have variable length. The difference between the numbers in the sequence also vary.

I see, can it also be represented as a many-hot vector? I.e. start with a
array of zeros up to MAX_N and set all the present ordinal indices to 1.

On Thu, Apr 18, 2019, 9:32 PM Gowtham Natarajan notifications@github.com
wrote:

The input is ordinal. That is, they are all numbers and can be sorted. And
euclidean distance applies to the input. But the range of numbers can be
different: Example: It could be any of the following:

1000, 1010, 1020, 1030, 1050
2200, 2205, 2210, 2215, 2240, 2250, 2260, 2300, 2400

So the numbers are in increasing order, and have variable length. The
difference between the numbers in the sequence also vary.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/ray-project/ray/issues/4639#issuecomment-484764512,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAADUSWCCA3WSLMV5AAABD3PRFDPBANCNFSM4HGD5JEA
.

It can. But the total number of possible numbers could be in the thousands. It may be too many. However the max number of numbers in a single sample is no more than 300. Will try several methods. Thanks.

The action space is to choose up to 4 numbers from the input.
For example, for the 1st example the output could be 1000, 1010, 1030.
For the second it could be 2205, 2215, 2260, 2400.

These would actually be parametric actions. For each number it picks, it should output an integer as well.
For example: one output could be:
1000, -10
1010, +20
1030, -50

Ah I see. Well, if it's only a couple thousand and not tens of thousands, I would still give the vector encoding a try, since it's a pretty natural representation. You could also then represent the output as a vector or MAX_N probabilities from which to sample up to K actions per step.

obs: Box(0, 1, (MAX_N,))

actions:
   Tuple([
       Discrete(k),       # how many actions to take
       Simplex([1, MAX_N]),  # action probabilities
       Box(-50, 50, (MAX_N,)),  # action values
    ])

You also mentioned euclidean distance applies to the entries. In that case it may also make sense to the action output to be Box (DiagGaussian distribution, perhaps with dimension k), which you can round to then nearest valid input number to pick the actions. Lots of possibilities...

how to use the Simplex class? like spaces.Simples ([1,MAX_N])? or spaces.Simplex(shape=(1,MAX_N)) ?

Auto closing stale issue. @gowthamnatarajan if the solution didn't work please feel free to reopen.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

robertnishihara picture robertnishihara  Â·  36Comments

manishagarwal23 picture manishagarwal23  Â·  32Comments

floringogianu picture floringogianu  Â·  32Comments

robertnishihara picture robertnishihara  Â·  50Comments

guoyuhong picture guoyuhong  Â·  112Comments