Wednesday, May 17, 2006

Why ARA* is Hard to Beat for Pacman No Tunnel


To the right is a plot of all of the planners I have used in the pacman domain with the tunnels sealed such that they cannot be used to reach the other side of the graph. Most importantly in the plot are the curves corresponding to greedy, minallpaths and ARA*. Notice that "min all paths" performs best overall. This corresponds to being omniscient and selecting to compute only the arcs in the overlayed network that will allow pacman to reach the goal state with minimum cost. The next best performer is ARA*. It is only slightly worse than that of "min all paths", suggesting that there is not much possibility for improvement if we choose to move according to the arcs of the overlay graph.

Notice that greedy does much worst than either ARA* and "min all paths". So even if we include greedy actions in the metaplan, it may still not be able to achieve the performance of ARA* for tis particular setup, since even if you were omniscient you would do only slightly better.


This suggests that perhaps a larger maze is necessary, as Nick mentioned a while back. I had verified that "min all paths" was significantly better than ARA* in the case where the tunnels were allowed to be traversed. I think ARA* suffers a performance degradation in this case because it has to expand the breadth of its search, causing the search tree to be much larger resulting in longer planning times and a higher probability of being eaten while planning.

Applying Random Plus Tree Actions To Pacman

Initial attempts to introduce the use of random directed actions have been stymied by a bunch of coding errors and mistakes. Unfortunately, the mistakes are costly in that a batch run will take approximately 24 hours to run (mostly due to sampling sequences of random and tree actions). I reduced the number of trials per batch from 250 to 75 after a few days worth of mistakes.

Here are some of the mistakes I've made,
Set the tunnelFlag to 1 when it should have been zer0.
Recursively generating the same data structure eating up memory

One thing I spotted was that the initial tree action was for arc 11 which is at the tail end of the graph. The policy would choose to look at arc 11 an then take a series of random actions, sometimes leading to paths that never included arc 11. I fixed this by determining the path that was closest to being completed based on which arcs had already been looked at and selected look actions corresponding to unexamined arcs on this path.

Another question was why it was choosing to take tree actions even when ghost speed was really high. I found this to be bizarre. I ended up trying to see if the transition probabilities that I had gathered made sense. They did not. When I summed up the probability to transition into the eaten during planning state from all states, I got a large number for the case where the ghosts do not move during computation. This should not be possible as it is impossible to be eaten during planning if the ghosts do not move.

I then realized that I had made an indexing error. Where I was not allocating enough space in my transition matrix for the "number of random actions taken" feature. This probably led to something being overwritten when the number of random actions taken was maximum. It then made it seem that taking the maximum number of random actions was bad, therefore forcing the policy to choose to take a tree action instead.

This was the latest problem discovered. I am running corrected trials now.

Tuesday, May 02, 2006

Including Random Action With Tree Actions


One issue with using the decision tree alone to determine the next compute action is that is doe not properly adapt to changes in computational costs. When the decision tree is generated using all possible world states, the corresponding zero computation cost tree accurately gives a compute policy that is optimal. As soon as the computation cost is non-zero, there is no guarantee that the decision tree will give effective compute actions. This is because the decision tree algorithm branches based on arc costs only and is unable to incorporate when execution (i.e. a move action) takes place. For this reason the decision tree may still use arcs that are irrelavant to execution to predict the plan cost. This means that even though computational costs are high, the tree may still recommend irrelavant arcs to be computed.

In order to alleviate this problem, we introduce the possibility of taking actions that may not be recommended by the decision tree. This set of actions corresponds to a set of directed random actions that makes the plan one step closer to the goal. This will take additional simulation to gather the effect of using tree actions and random actions in combination.

The idea is that if computational costs are high, the MDP has the opportunity to bypass the decision tree actions in favor of the set of directed random actions.

The tree actions used are those corresponding to the zero computational cost tree.

Here on the right are results on the 4 node 4 arc binary graph comparing the results of using just the deicison tree and the result of using both the decision tree and the possibility of taking a random directed action.


When computational costs are high , it appears that there is improvement in performance of the dual action MDP over the tree-only MDP.

I will next apply this to the pacman problem.