Thursday, September 29, 2005

Results for new Graph cont.



Shown above is a plot of the results for the new graph configuration. On the y-axis is the overall cost for goal completion. The x-axis gives epsilon, the multiplying factor to the cost of computation for each individual arc. The cost of computation was set to 0.01. There are 3 lines on the plot. The pink line represents the results for the plan all strategy, which performs a full dynamic programming back up of the graph to determine the optimal path. It then executes this plan to reach the goal. This strategy gives the best plan, but also is the most expensive to compute. The blue line shows results for the random strategy. The random strategy flips 3 coins to determine which outgoing arc to compute at each stage. It then proceeds to compute those arc costs and moves along those arcs. This strategy takes the least computational cost, since it always makes just enough computations for a path from the start to the goal. The drawback to this strategy is the paths that are generated are of low quality. The yellow line show the results for the Exact MDP policy, which varies as epsilon varies. Initially, when epsilon is zero, meaning that the computation cost is also zero, the MDP policy behaves like the plan all strategy. This is because it is free to determine the optimal plan. On the other extreme, as epsilon rises, the MDP policy only computes enough arcs for a path to the goal, and starts to look like the behavior of the random strategy. For intermediate epsilon, the MDP policy knows how much to explore prior to execution, and performs better than both the random and plan all strategies.

Results for New Graph


The layout of the new graph is given in the diagram to the right along with the label for arcs and nodes. Currently I have results for the Exact MDP model which are given next

Monday, September 26, 2005

Research


My research involves learning about the tradeoffs of planning versus acting. There are many sources of uncertainty involved in learning the correct behavior given a "goal completion" problem. By goal completion I mean a problem for which a plan must be generated and acted out inorder to accomplish a goal to optimize some objective function. For instance an example of such a problem is when a robot must plan it way over terrain in order to reach some goal location in minimum time. in order for the robot to move, it must have at least a partial plan. There are several behavioral options for the robot. One option is to plan a complete an optiaml path to the goal and then acting upon that plan. Another is for it generate a partial plan, execute part of it and then replan along the way. There are a myriad of intermediate options.

One way to determine the optimal behavior is to model the problem of planning versus action as a Markov Decision Process. I have performed several experiments on simple path planning problems to illustrate that there do exist intermediate behavioral options that do better than plan all or plan to completion schemes.

Some problems I have worked on in the past include determining the optimal stopping time for the 2-opt algorithm as applied to a 50 city Traveling Salesman Problem. In addition, I have worked optimal stopping on an iterative version of the A* algorithm.

Recently my work has been focused on determining where to focus computation. The domain is a small graph path planning problem. The idea is, given a graph with stochastic arc costs, reach to goal with minimal cost, where cost is the cost of traveling along an arc plus the time spent computing. Each instance of the graph is generated at random, and it is the agents job to properly determine how to reach the goal. The agent has the ability to compute the true cost of the arc at a fixed computational cost. The information that agent receives about the arc cost will help it decide which arc to look at next. At some point the agent much decide to execute its current plan in order to move to the goal.

For these simple problems I am able to generate 2 MDP models. The QL model is an approximate model, that uses the "quality" of the current solution as the state of the algorithm. The quality is an estimate of how good the current path in comparison to the expected optimal. The other model is the Exact model, where the state includes the exact costs of individual arcs. This problem quickly explodes in an exponential fashion in the number of arcs and is not a feasible choice for even moderately sized problems.

Here are some results for these two models.

Tuesday, September 13, 2005

My First Blog

This is a test.