Dynamic Decision Networks
Part of the conceptual definition of a POMDP or MDP is a dynamic decision network (DDN) that defines which random variables are dependent on each other. Usually, problem writers will not have to interact directly with the DDN, but it is a helpful concept for understanding, and it can be customized for special problem types.
The standard POMDPs.jl DDN models are shown below:
Standard MDP DDN | Standard POMDP DDN |
---|---|
In order to provide additional flexibility, these DDNs have :s
→:o
, :sp
→:r
and :o
→:r
edges that are typically absent from the DDNs traditionally used in the (PO)MDP literature. Traditional (PO)MDP algorithms are compatible with these DDNs because only $R(s,a)$, the expectation of $R(s, a, s', o)$ over all $s'$ and $o$ is needed to make optimal decisions.
DDN structure representation
In POMDPs.jl, each DDN node corresponds to a Symbol
. Often a p
character (mnemonic: "prime") is appended to denote a new value for the next timestep, e.g. :sp
represents $s'$, the state at the next step.
A DDNStructure
object contains the names of all the nodes, the edges between the nodes, and an object for each node that defines its implementation.
Currently, there are four types of nodes:
DistributionDDNNode
to define nodes with stochastic output.FunctionDDNNode
to define a node that is a deterministic function of other nodes.ConstantDDNNode
for a constant.GenericDDNNode
for a node that has no implementation other thangen
(see Defining behavior for nodes below).
This set is not expected to handle all possible behavior, so new types are likely to be added in the future (and they should be requested when concrete needs are encountered).
Defining behavior for nodes
For any node in the DDN, the function gen
(::DDNNode{:nodename}, m, parent_values..., rng)
will be called to sample a value (see the docstring for more information). This method can always be implemented to provide a generative definition for a node.
Some nodes can alternatively have an explicit implementation. For example, a DistributionDDNNode
contains a function that is called with the (PO)MDP models and values sampled from the parent nodes to return a distribution. The state transition node, :sp
, is a particular case of this. If gen
(::GenVar{:sp}, m, s, a, rng)
is not defined by the problem writer, rand(rng, transition(m, s, a))
will be called to generate values for :sp
.
Mixing generative and explicit node definitions for a POMDP
POMDP models will often contain a mixture of Generative and explicit definitions, and this is an encouraged paradigm. For example
using Distributions
struct MyPOMDP <: POMDP{Float64, Float64, Float64} end
POMDPs.gen(::GenVar{:sp}, m::MyPOMDP, s, a, rng) = s+a
POMDPs.observation(::GenVar{:o}, m, s, a, sp, rng) = Normal(sp)
would be a suitable distribution for a POMDP that will be solved with particle filtering methods where an explicit observation definition is needed, but a generative state transition definition is sufficient.
It is usually best to avoid providing both a generative and explicit definition of the same node because it is easy to introduce inconsistency.
Customizing the DDN
The DDN structure for a particular (PO)MDP problem type
is defined with the DDNStructure
trait, which should return a DDNStructure
object (or something else that implements the appropriate methods). See the docstring for an example.
If a specialized DDN structure is to be compatible with standard POMDP solvers, it should have the standard :sp
, :r
, and :o
nodes.
Currently (as of September, 2019), no solver has special behavior based on the DDN structure, but it is expected that packages will define new DDN structures for specialized cases like constrained POMDPs, mixed observability MDPs, or factored POMDPs. If you are considering creating a solver that relies on a specific DDN structure, please contact the developers so we can coordinate.