Generative (PO)MDP Interface

Quick Start

A generative model for most (PO)MDPs can be completely defined with one function:

function POMDPs.gen(m::YourPOMDPType, s, a, rng)
    # do dynamics/transition calculations here
    return (sp= #=new state=#, r= #=reward=#, o= #=observation=#)
end

(o is not needed for MDPs.)

Interface Description

The generative interface consists of two components:

  • The gen function or the @gen macro returns samples (e.g. states, observations and rewards) from a generative POMDP model.
  • initialstate returns a sampled initial state.

The generative interface is typically used when it is easier to return sampled states and observations rather than explicit distributions as in the Explicit interface. This type of model is often referred to as a "black-box" model.

In some special cases (e.g. reinforcement learning with RLInterface.jl), an initial observation is needed before any actions are taken. In this case, the initialobs function will be used.

The gen function and @gen macro

In most cases solvers and simulators should use the @gen macro, while problem-writers should implement new methods of the gen function. For example

sp, o, r = @gen(:sp,:o,:r)(m, s, a, rng)

calls the generative model for POMDP m at state s and action a, and stores the next state, observation, and reward in variables sp, o, and r. rng is a random number generator.

The gen function has three versions differentiated by the type of the first argument.

  • gen(m::Union{POMDP, MDP}, s, a, rng) provides a way to implement a generative model for an entire (PO)MDP in a single function. It should return values for a subset of the DDN Nodes as a NamedTuple.

    • This is typically the quickest and easiest way to implement a new POMDP model or wrap an existing simulator.
    • Example (defined by a problem writer): gen(m::MyPOMDP, s, a, rng) = (sp=s+a, r=s^2, o=s+a+randn(rng))
    • This version should never be called by a solver or simulator, since there is no guarantee of which values will be present in the returned object.
    • Values for DDN nodes not present in the returned NamedTuple will be generated in the normal way with gen(::DDNNode, ...) or an explicit representation.
  • gen(::DDNNode{nodename}, m, parent_values..., rng) defines the generative model for a single DDN node. Together, a group of these functions can define a problem.

    • Example (defined by a problem writer): gen(::DDNNode{:o}, m::MyPOMDP, s, a, sp, rng) = sp + randn(rng)
    • Solver writers should only directly call this version in very rare cases when it needs to access to values for a particular node of the DDN generated by specific values of its parent nodes.
  • gen(::DDNOut{nodenames}, m, s, a, rng) returns a value or tuple of values for a subset of nodes in the DDN. The arguments are values for the input nodes (currently :s and :a), treating the entire DDN as a single black box.

    • Example (called in a solver): sp, o, r = gen(DDNOut(:sp,:o,:r), m, s, a, rng)
    • This function is automatically synthesized by POMDPs.jl by combining gen(m, s, a, rng) and gen(::DDNNode, ...) or explicit model definitions for all DDN nodes.
    • This version should only be implemented directly by problem writers in very rare cases when they need precise control for efficiency.

In all versions, m is a (PO)MDP model, and rng is a random number generator.

Examples

An example of defining a problem with the generative interface can be found in the POMDPExamples package.

Random number generators

The rng argument to functions in the generative interface is a random number generator such as Random.GLOBAL_RNG or another MersenneTwister. It should be used to generate all random numbers within the function (e.g. use rand(rng) instead of rand()). This will ensure that all simulations are exactly repeatable. See the Julia documentation on random numbers for more information about these objects.

Performance considerations

In general, calling gen(::DDNOut, ...) when gen(::POMDP, ...) is implemented does not introduce much overhead. In fact, in some cases, the compiler will even optimize out calculations of extra genvars. For example:

struct M <: MDP{Int, Int} end

POMDPs.gen(::M, s, a, rng) = (sp=s+a, r=s^2)

@code_warntype gen(DDNOut(:sp), M(), 1, 1, Random.GLOBAL_RNG)

will yield

Body::Int64
1 ─ %1 = (Base.add_int)(s, a)::Int64
│        nothing
└──      return %1

indicating that the compiler will only perform the addition to find the next state and skip the s^2 calculation for the reward.

Unfortunately, if random numbers are used in gen, the compiler will not be able to optimize out the change in the rng's state, so it may be beneficial to directly implement versions of gen(::DDNNode, ...). For example

POMDPs.gen(::DDNNode{:sp}, ::M, s, a, rng) = s+a
POMDPs.reward(::M, s, a) = abs(s)
PODMPs.gen(::DDNNode{:o}, ::M, s, a, sp, rng) = sp+randn(rng)

might be more efficient than

function POMDPs.gen(::M, s, a, rng)
    sp = s + a
    return (sp=sp, r=abs(s), o=sp+randn(rng))
end

in the context of particle filtering.

As always, though, one should resist the urge towards premature optimization; careful profiling to see what is actually slow is much more effective than speculation.