Defining (PO)MDP Dynamics

The dynamics of a (PO)MDP define how states, observations, and rewards are generated at each time step. One way to visualize the structure of (PO)MDP is with a dynamic decision network (DDN) (see for example Decision Making under Uncertainty by Kochenderfer et al. or this webpage for more discussion of dynamic decision networks).

The POMDPs.jl DDN models are shown below:

Standard MDP DDNStandard POMDP DDN

In order to provide additional flexibility, these DDNs have :s:o, :sp:r and :o:r edges that are typically absent from the DDNs traditionally used in the (PO)MDP literature. Traditional (PO)MDP algorithms are compatible with these DDNs because only $R(s,a)$, the expectation of $R(s, a, s', o)$ over all $s'$ and $o$ is needed to make optimal decisions.

The task of defining the dynamics of a (PO)MDP consists of defining a model for each of the nodes in the DDN. Models for each node can either be implemented separately through the transition, observation, and reward functions, or together with the gen function.

Separate explicit or generative definition

  • transition(pomdp, s, a) defines the state transition probability distribution for state s and action a. This defines an explicit model for the :sp DDN node.
  • observation(pomdp, [s,] a, sp) defines the observation distribution given that action a was taken and the state is now sp (The observation can optionally depend on s - see docstring). This defines an explicit model for the :o DDN node.
  • reward(pomdp, s, a[, sp[, o]]) defines the reward, which is a deterministic function of the state and action (and optionally sp and o - see docstring). This defines an explicit model for the :r DDN node.

transition and observation should return distribution objects that implement part or all of the distribution interface. Some predefined distributions can be found in Distributions.jl or POMDPModelTools.jl, or custom types that represent distributions appropriate for the problem may be created.


There is no requirement that a problem defined using the explicit interface be discrete; it is straightforward to define continuous POMDPs with the explicit interface, provided that the distributions have some finite parameterization.

Combined generative definition

If the state, observation, and reward are generated simultaneously, a new method of the gen function should be implemented to return the state, observation and reward in a single NamedTuple.


An example of defining a problem using separate functions can be found at:

An example of defining a problem with a combined gen function can be found at: