Results for new Graph cont.
Shown above is a plot of the results for the new graph configuration. On the y-axis is the overall cost for goal completion. The x-axis gives epsilon, the multiplying factor to the cost of computation for each individual arc. The cost of computation was set to 0.01. There are 3 lines on the plot. The pink line represents the results for the plan all strategy, which performs a full dynamic programming back up of the graph to determine the optimal path. It then executes this plan to reach the goal. This strategy gives the best plan, but also is the most expensive to compute. The blue line shows results for the random strategy. The random strategy flips 3 coins to determine which outgoing arc to compute at each stage. It then proceeds to compute those arc costs and moves along those arcs. This strategy takes the least computational cost, since it always makes just enough computations for a path from the start to the goal. The drawback to this strategy is the paths that are generated are of low quality. The yellow line show the results for the Exact MDP policy, which varies as epsilon varies. Initially, when epsilon is zero, meaning that the computation cost is also zero, the MDP policy behaves like the plan all strategy. This is because it is free to determine the optimal plan. On the other extreme, as epsilon rises, the MDP policy only computes enough arcs for a path to the goal, and starts to look like the behavior of the random strategy. For intermediate epsilon, the MDP policy knows how much to explore prior to execution, and performs better than both the random and plan all strategies.