For Bejeweled, after the same colour in a row disappeared, the jewel from row above will move down and if the new row match then they will dissappear too.
Assuming no new jewel will be generated, just find the best move to solve all the existing jewels.
One action will cause other several moves. I assume one vector action requires multiple observations but ml-agents only support one observation for every vector action. Any suggestion?
Wouldn't your observation be the board state and your action is which tile to move? That's one observation per move. Multiple changes to the board state between observation and action don't matter since they are outside the control of the agent.
Dear Joe Ward, thank you for your clarification.
I thought I need to record all those intermediate states which lead to the final state for ml-agents to learn efficiently, I will try and see whether the learning is successful.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
Wouldn't your observation be the board state and your action is which tile to move? That's one observation per move. Multiple changes to the board state between observation and action don't matter since they are outside the control of the agent.