Ray: [Question] How to obtain the action embedding for avail_action ?

Created on 26 Aug 2019  Â·  8Comments  Â·  Source: ray-project/ray

This is a general question about parametric action space but I think people here may know the solution. I try to embed the action but in the toy model given by rllib, the available action embedding are generated randomly in the env. Shouldn't be the embeddings learned? For example, for word2vec, it's learned from corpus. For a more practical RL problem, how could we obtain those embeddings? Thank you.

question stale

All 8 comments

You can certainly try to learn it, that's up to you. To do that you can have a random embedding matrix instead of fixed values.

Thank you Eric for the reply. I am wondering if you could provide me with some instructions on a common way to learn the weights? In my understanding, is this equivalent with adding a fully connected layer (weights is embedding size x number of actions) at the end of my model (say my model outputs the dimension of embedding size) so that the output becomes the logits desired? In that case, it is equivalent of modifying network architecture and I don't see the meaning of introducing parametric actions.

I try to mimic how openAI embed the Dota action space but could not find any detailed explanations about how to learn the action embedding matrix.

Thank you!

You want to go from action (0..N index) to a fixed size embedding (size M)
right? The typical way to do this is to multiply by a matrix of size (NxM),
which can be learnable. This gives you your action embeddings.

Edit: note that the "multiply" is just a lookup in the embedding table (it's equivalent to a multiply if the action index is one-hot encoded). You can take a look at torch.nn.Embedding for an example of how it works.

On Mon, Aug 26, 2019, 2:21 PM soloist96 notifications@github.com wrote:

Thank you Eric for the reply. I am wondering if you could provide me with
some instructions on a common way to learn the weights? In my
understanding, is this equivalent with adding a fully connected layer
(weights is embedding size x number of actions) at the end of my model (say
my model outputs the dimension of embedding size) so that the output
becomes the logits desired? In that case, it is equivalent of modifying
network architecture and I don't see the meaning of introducing parametric
actions.

I try to mimic how openAI embed the Dota action space but could not find
any detailed explanations about how to learn the action embedding matrix.

Thank you!

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/ray-project/ray/issues/5540?email_source=notifications&email_token=AAADUSWOGQFSQCHFWZGI2O3QGRCPLA5CNFSM4IPRHSX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5FXC3Y#issuecomment-525037935,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAADUSTRSBQYVDXIDDROCQDQGRCPLANCNFSM4IPRHSXQ
.

Thank you Eric. Yes, I want to go from an action to an embedding. And I know how the embedding table works. And my confuse is about how to use the RLlib framework to learn this embedding. For NLP problems, people can use Cbow or GloVe to train on corpus to get this embedding. However, in the RL framework, I am not clear how we get those embeddings for actions.

Are we expected to incorporate this embedding matrix in the policy neural network to get the weights learned? Thanks a lot.

Yes, it should be part of your model weights and is trained via backprop?

On Wed, Aug 28, 2019, 8:13 AM soloist96 notifications@github.com wrote:

Thank you Eric. Yes, I want to go from an action to an embedding. And I
know how the embedding table works. And my confuse is about how to use the
RLlib framework to learn this embedding. For NLP problems, people can use
Cbow or GloVe to train on corpus to get this embedding. However, in the RL
framework, I am not clear how we get those embeddings for actions.

Are we expected to incorporate this embedding matrix in the policy neural
network to get the weights learned? Thanks a lot.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/ray-project/ray/issues/5540?email_source=notifications&email_token=AAADUSSNRDUUEHGRHIVQHU3QG2IYVA5CNFSM4IPRHSX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5LO3JI#issuecomment-525790629,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAADUSWMKYV4BWGANIVNYTLQG2IYVANCNFSM4IPRHSXQ
.

Yeah, I agree! Thanks a lot!

@soloist96 Interesting discussion. Did you find an easy way of doing the suggested embedding training technique?

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

  • If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
  • If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

Was this page helpful?
0 / 5 - 0 ratings