Friday, October 21, 2005

Some Observations About Formulations

1. When the cost of computation is high, the MDP settles down to a deterministic greedy policy. However, there are results that show that the average of the MDP results is slightly worse than the random policy for high computational cost. This suggests that a policy with randomized actions might perform better than a policy with deterministic actions when the optimal metaplanning decision is to make a beeline for the goal. This may also hold for intermediate cases as well. I should look into papers about randomized action policies.

2. I have completed an implementation of the Generic Graph MDP solver. Under this formulation, my states are a 4-tuple consisting of {Arcs Examined, Optimal Quality Level Given Known Information, Current Best Action, Location}. It appears to be working correctly albeit slowly for larger models. I tested it on the original Small Graph and obtain similar results to the Model with 2 Quality Levels in the state tuple. This would suggest that Having multiple quality levels with a minimum arc specification (i.e. current best action) is not necessary. (In a previous blog I comment why it is necessary for multiple quality levels when there is no minimum arc specification).

3. As I see it now, one of the limiting factors for larger graphs is the need to track which arcs have been computed. This is an exponential term 2^(number of arcs) in the complexity, and causes even more problems in the case of generating transition probabilities. No doubt, it will be necessary to do something similar to having QL represent the state of the solution rather than the cost of individual arcs. Perhaps instead of tracking which arcs have been tracked, we can aggregate states by the depth and breadth of the space explored?

Location may also present a problem in the near future. Each location requires the generation of a set of transition probabilities. This can easily get out of hand.

0 Comments:

Post a Comment

<< Home