Including Random Action With Tree Actions
One issue with using the decision tree alone to determine the next compute action is that is doe not properly adapt to changes in computational costs. When the decision tree is generated using all possible world states, the corresponding zero computation cost tree accurately gives a compute policy that is optimal. As soon as the computation cost is non-zero, there is no guarantee that the decision tree will give effective compute actions. This is because the decision tree algorithm branches based on arc costs only and is unable to incorporate when execution (i.e. a move action) takes place. For this reason the decision tree may still use arcs that are irrelavant to execution to predict the plan cost. This means that even though computational costs are high, the tree may still recommend irrelavant arcs to be computed.
In order to alleviate this problem, we introduce the possibility of taking actions that may not be recommended by the decision tree. This set of actions corresponds to a set of directed random actions that makes the plan one step closer to the goal. This will take additional simulation to gather the effect of using tree actions and random actions in combination.
The idea is that if computational costs are high, the MDP has the opportunity to bypass the decision tree actions in favor of the set of directed random actions.
The tree actions used are those corresponding to the zero computational cost tree.
Here on the right are results on the 4 node 4 arc binary graph comparing the results of using just the deicison tree and the result of using both the decision tree and the possibility of taking a random directed action.
When computational costs are high , it appears that there is improvement in performance of the dual action MDP over the tree-only MDP.
I will next apply this to the pacman problem.
0 Comments:
Post a Comment
<< Home