MIT Research: Research

My research involves learning about the tradeoffs of planning versus acting. There are many sources of uncertainty involved in learning the correct behavior given a "goal completion" problem. By goal completion I mean a problem for which a plan must be generated and acted out inorder to accomplish a goal to optimize some objective function. For instance an example of such a problem is when a robot must plan it way over terrain in order to reach some goal location in minimum time. in order for the robot to move, it must have at least a partial plan. There are several behavioral options for the robot. One option is to plan a complete an optiaml path to the goal and then acting upon that plan. Another is for it generate a partial plan, execute part of it and then replan along the way. There are a myriad of intermediate options.

One way to determine the optimal behavior is to model the problem of planning versus action as a Markov Decision Process. I have performed several experiments on simple path planning problems to illustrate that there do exist intermediate behavioral options that do better than plan all or plan to completion schemes.

Some problems I have worked on in the past include determining the optimal stopping time for the 2-opt algorithm as applied to a 50 city Traveling Salesman Problem. In addition, I have worked optimal stopping on an iterative version of the A* algorithm.

Recently my work has been focused on determining where to focus computation. The domain is a small graph path planning problem. The idea is, given a graph with stochastic arc costs, reach to goal with minimal cost, where cost is the cost of traveling along an arc plus the time spent computing. Each instance of the graph is generated at random, and it is the agents job to properly determine how to reach the goal. The agent has the ability to compute the true cost of the arc at a fixed computational cost. The information that agent receives about the arc cost will help it decide which arc to look at next. At some point the agent much decide to execute its current plan in order to move to the goal.

For these simple problems I am able to generate 2 MDP models. The QL model is an approximate model, that uses the "quality" of the current solution as the state of the algorithm. The quality is an estimate of how good the current path in comparison to the expected optimal. The other model is the Exact model, where the state includes the exact costs of individual arcs. This problem quickly explodes in an exponential fashion in the number of arcs and is not a feasible choice for even moderately sized problems.

Here are some results for these two models.

MIT Research

Monday, September 26, 2005

Research

1 Comments:

About Me

Previous Posts