Policies
The MOMDP
type currently has only been developed to work with alpha vector policies. If other policy types are desired, please open an issue on the GitHub repository. Until the issue has been resolved, you can convert your MOMDP to a POMDP using the POMDP_of_Discrete_MOMDP
type (or implement your own conversion).
MOMDPAlphaVectorPolicy
The MOMDPAlphaVectorPolicy
type is similar to the AlphaVectorPolicy
from POMDPTools.jl. The main difference is how the alpha vectors are stored. AlphaVectorPolicy
stores alpha vectors as a vector of alpha vectors, i.e. Vector{Vector{Float64}}
.
A MOMDP value function $V(x, b_y)$ can be represented as a collection of vector sets $\Gamma_{y}(x) \mid x \in \mathcal{X}$. Therefore, in MOMDPAlphaVectorPolicy
, we have a vector of alpha vectors for each visible state and the size of the alpha vector is $|\mathcal{Y}|$ (the number of hidden states). Therefore, we have a vector of vectors of alpha vectors, i.e. Vector{Vector{Vector{Float64}}}
.
For the action map (action_map
), we also have a vector of size $|\mathcal{X}|$ (the number of visible states) that contains a vector of actions associated with each alpha vector.
Value Function
With an alpha vector policy represented as MOMDPAlphaVectorPolicy
(a collection of alpha vector sets), can use our visible state to determine the appropriate set and then find the maximum alpha vector in the set $V(x, b_{\mathcal{Y}}) = \max_{\alpha \in \Gamma_{y(x)}} \{\alpha \cdot b_{\mathcal{Y}} \}$ value(p::MOMDPAlphaVectorPolicy, b, x)
is provided to evaluate a MOMDPAlphaVectorPolicy
with a known visible state x
.
However, to maintain compatibility with simulation tools already existing within the POMDPs.jl ecosystem we also provide the ability to execute a computed MOMDP policy as an MOMDPAlphaVectorPolicy
assuming a POMDP model (allowing for uncertainty over $\mathcal{X}$ as well). We first calculate $b_{\mathcal{X}}(x) = \sum_{y \in \mathcal{Y}} b(x,y)$ and then $V^\prime(b) = \sum_{x \in \mathcal{X}} b_{\mathcal{X}}(x) V(x, b_{\mathcal{Y} \mid x})$ where $b_{\mathcal{Y} \mid x} = b(x,y) / b_{\mathcal{X}}(x)$. This value calcualtion is provided by value(p::MOMDPAlphaVectorPolicy, b)
.
When the visible state is known, we also provide actionvalues(p::MOMDPAlphaVectorPolicy, b, x)
to compute the action values (Q-values) for all actions given a belief b
over hidden states and a known visible state x
. This performes a one-step lookahead to compute action values.
Action Function
The action function action(p::MOMDPAlphaVectorPolicy, b)
implements a heuristic instead of a true one step lookahead when there is uncertainty over $\mathcal{X}$. If executing the MOMDP policy as a POMDP and x
is not known, then we recommend implementing a custom action function that performs a one-step lookahead using the value
function.
As implemented, action
finds the state x
with the largest probability mass in b
, forms a conditional distribution over y
given that x
, finds the alpha vector within the subset associated with x
that maximizes the value given the conditional distirbution, and returns the actions associated with that alpha vector.
We also provide a function action(p::MOMDPAlphaVectorPolicy, b, x)
that assumes the visible state x
is known and avoides the need to infer x
from the belief distribution.