The dynamics of a (PO)MDP define how states, observations, and rewards are generated at each time step. One way to visualize the structure of (PO)MDP is with a dynamic decision network (DDN) (see for example Decision Making under Uncertainty by Kochenderfer et al. or this webpage for more discussion of dynamic decision networks).
The POMDPs.jl DDN models are shown below:
|Standard MDP DDN||Standard POMDP DDN|
In order to provide additional flexibility, these DDNs have
:r edges that are typically absent from the DDNs traditionally used in the (PO)MDP literature. Traditional (PO)MDP algorithms are compatible with these DDNs because only $R(s,a)$, the expectation of $R(s, a, s', o)$ over all $s'$ and $o$ is needed to make optimal decisions.
The task of defining the dynamics of a (PO)MDP consists of defining a model for each of the nodes in the DDN. Models for each node can either be implemented separately through the
reward functions, or together with the
(pomdp, s, a)defines the state transition probability distribution for state
a. This defines an explicit model for the
(pomdp, [s,] a, sp)defines the observation distribution given that action
awas taken and the state is now
sp(The observation can optionally depend on
s- see docstring). This defines an explicit model for the
(pomdp, s, a[, sp[, o]])defines the reward, which is a deterministic function of the state and action (and optionally
o- see docstring). This defines an explicit model for the
observation should return distribution objects that implement part or all of the distribution interface. Some predefined distributions can be found in Distributions.jl or POMDPModelTools.jl, or custom types that represent distributions appropriate for the problem may be created.
There is no requirement that a problem defined using the explicit interface be discrete; it is straightforward to define continuous POMDPs with the explicit interface, provided that the distributions have some finite parameterization.
If the state, observation, and reward are generated simultaneously, a new method of the
gen function should be implemented to return the state, observation and reward in a single
An example of defining a problem using separate functions can be found at: https://github.com/JuliaPOMDP/POMDPExamples.jl/blob/master/notebooks/Defining-a-POMDP-with-the-Explicit-Interface.ipynb
An example of defining a problem with a combined
gen function can be found at: https://github.com/JuliaPOMDP/POMDPExamples.jl/blob/master/notebooks/Defining-a-POMDP-with-the-Generative-Interface.ipynb