Generative (PO)MDP Interface
Quick Start
A generative model for most (PO)MDPs can be completely defined with one function:
function POMDPs.gen(m::YourPOMDPType, s, a, rng)
# do dynamics/transition calculations here
return (sp= #=new state=#, r= #=reward=#, o= #=observation=#)
end
(o
is not needed for MDPs.)
Interface Description
The generative interface consists of two components:
- The
gen
function or the@gen
macro returns samples (e.g. states, observations and rewards) from a generative POMDP model. initialstate
returns a sampled initial state.
The generative interface is typically used when it is easier to return sampled states and observations rather than explicit distributions as in the Explicit interface. This type of model is often referred to as a "black-box" model.
In some special cases (e.g. reinforcement learning with RLInterface.jl), an initial observation is needed before any actions are taken. In this case, the initialobs
function will be used.
The gen
function and @gen
macro
In most cases solvers and simulators should use the @gen
macro, while problem-writers should implement new methods of the gen
function. For example
sp, o, r = @gen(:sp,:o,:r)(m, s, a, rng)
calls the generative model for POMDP m
at state s
and action a
, and stores the next state, observation, and reward in variables sp
, o
, and r
. rng
is a random number generator.
The gen
function has three versions differentiated by the type of the first argument.
gen(m::Union{POMDP, MDP}, s, a, rng)
provides a way to implement a generative model for an entire (PO)MDP in a single function. It should return values for a subset of the DDN Nodes as aNamedTuple
.- This is typically the quickest and easiest way to implement a new POMDP model or wrap an existing simulator.
- Example (defined by a problem writer):
gen(m::MyPOMDP, s, a, rng) = (sp=s+a, r=s^2, o=s+a+randn(rng))
- This version should never be called by a solver or simulator, since there is no guarantee of which values will be present in the returned object.
- Values for DDN nodes not present in the returned
NamedTuple
will be generated in the normal way withgen(::DDNNode, ...)
or an explicit representation.
gen(::
DDNNode
{nodename}, m, parent_values..., rng)
defines the generative model for a single DDN node. Together, a group of these functions can define a problem.- Example (defined by a problem writer):
gen(::DDNNode{:o}, m::MyPOMDP, s, a, sp, rng) = sp + randn(rng)
- Solver writers should only directly call this version in very rare cases when it needs to access to values for a particular node of the DDN generated by specific values of its parent nodes.
- Example (defined by a problem writer):
gen(::
DDNOut
{nodenames}, m, s, a, rng)
returns a value or tuple of values for a subset of nodes in the DDN. The arguments are values for the input nodes (currently:s
and:a
), treating the entire DDN as a single black box.- Example (called in a solver):
sp, o, r = gen(DDNOut(:sp,:o,:r), m, s, a, rng)
- This function is automatically synthesized by POMDPs.jl by combining
gen(m, s, a, rng)
andgen(::DDNNode, ...)
or explicit model definitions for all DDN nodes. - This version should only be implemented directly by problem writers in very rare cases when they need precise control for efficiency.
- Example (called in a solver):
In all versions, m
is a (PO)MDP model, and rng
is a random number generator.
Examples
An example of defining a problem with the generative interface can be found in the POMDPExamples package.
Random number generators
The rng
argument to functions in the generative interface is a random number generator such as Random.GLOBAL_RNG
or another MersenneTwister
. It should be used to generate all random numbers within the function (e.g. use rand(rng)
instead of rand()
). This will ensure that all simulations are exactly repeatable. See the Julia documentation on random numbers for more information about these objects.
Performance considerations
In general, calling gen(::DDNOut, ...)
when gen(::POMDP, ...)
is implemented does not introduce much overhead. In fact, in some cases, the compiler will even optimize out calculations of extra genvars. For example:
struct M <: MDP{Int, Int} end
POMDPs.gen(::M, s, a, rng) = (sp=s+a, r=s^2)
@code_warntype gen(DDNOut(:sp), M(), 1, 1, Random.GLOBAL_RNG)
will yield
Body::Int64
1 ─ %1 = (Base.add_int)(s, a)::Int64
│ nothing
└── return %1
indicating that the compiler will only perform the addition to find the next state and skip the s^2
calculation for the reward.
Unfortunately, if random numbers are used in gen
, the compiler will not be able to optimize out the change in the rng's state, so it may be beneficial to directly implement versions of gen(::DDNNode, ...)
. For example
POMDPs.gen(::DDNNode{:sp}, ::M, s, a, rng) = s+a
POMDPs.reward(::M, s, a) = abs(s)
PODMPs.gen(::DDNNode{:o}, ::M, s, a, sp, rng) = sp+randn(rng)
might be more efficient than
function POMDPs.gen(::M, s, a, rng)
sp = s + a
return (sp=sp, r=abs(s), o=sp+randn(rng))
end
in the context of particle filtering.
As always, though, one should resist the urge towards premature optimization; careful profiling to see what is actually slow is much more effective than speculation.