Interacting with Policies

A solution to a POMDP is a policy that maps beliefs or action-observation histories to actions. In POMDPs.jl, these are represented by Policy objects. See Solvers and Policies for more information about what a policy can represent in general.

One common task in evaluating POMDP solutions is examining the policies themselves. Since the internal representation of a policy is an esoteric implementation detail, it is best to interact with policies through the action and value interface functions. There are three relevant methods

action(policy, s) returns the best action (or one of the best) for the given state or belief.
value(policy, s) returns the expected sum of future rewards if the policy is executed.
value(policy, s, a) returns the "Q-value", that is, the expected sum of rewards if action a is taken on the next step and then the policy is executed.

Note that the quantities returned by these functions are what the policy/solver expects to be the case after its (usually approximate) computations; they may be far from the true value if the solution is not exactly optimal.