Generative POMDP Interface

Description

The generative interface contains a small collection of functions that makes implementing and solving problems with generative models easier. These functions return states and observations instead of distributions as in the Explicit interface.

The functions are:

generate_s(pomdp, s, a, rng) -> sp
generate_o(pomdp, s, a, sp, rng) -> o
generate_sr(pomdp, s, a, rng) -> (s, r)
generate_so(pomdp, s, a, rng) -> (s, o)
generate_or(pomdp, s, a, sp, rng) -> (o, r)
generate_sor(pomdp, s, a, rng) -> (s, o, r)
initial_state(pomdp, rng) -> s

Each generate_ function is a single step simulator that returns a new state, observation, reward, or a combination given the current state and action (and sp in some cases). rng is a random number generator such as Base.GLOBAL_RNG or another MersenneTwister that is passed as an argument and should be used to generate all random numbers within the function to ensure that all simulations are exactly repeatable.

The functions that do not deal with observations may be defined for MDPs as well as POMDPs.

A problem writer will generally only have to implement one or two of these functions for all solvers to work (see below).

Example

The following example shows an implementation of the Crying Baby problem [1]. A definition of this problem using the explicit interface is given in the POMDPModels package.

importall POMDPs

# state: true=hungry, action: true=feed, obs: true=crying

type BabyPOMDP <: POMDP{Bool, Bool, Bool}
    r_feed::Float64
    r_hungry::Float64
    p_become_hungry::Float64
    p_cry_when_hungry::Float64
    p_cry_when_not_hungry::Float64
    discount::Float64
end
BabyPOMDP() = BabyPOMDP(-5., -10., 0.1, 0.8, 0.1, 0.9)

discount(p::BabyPOMDP) = p.discount

function generate_s(p::BabyPOMDP, s::Bool, a::Bool, rng::AbstractRNG)
    if s # hungry
        return true
    else # not hungry
        return rand(rng) < p.p_become_hungry ? true : false
    end
end

function generate_o(p::BabyPOMDP, s::Bool, a::Bool, sp::Bool, rng::AbstractRNG)
    if sp # hungry
        return rand(rng) < p.p_cry_when_hungry ? true : false
    else # not hungry
        return rand(rng) < p.p_cry_when_not_hungry ? true : false
    end
end

# r_hungry
reward(p::BabyPOMDP, s::Bool, a::Bool) = (s ? p.r_hungry : 0.0) + (a ? p.r_feed : 0.0)

initial_state_distribution(p::BabyPOMDP) = [false] # note rand(rng, [false]) = false, so this is encoding that the baby always starts out full

This can be solved with the POMCP solver.

using POMCP
using POMDPToolbox

pomdp = BabyPOMDP()
solver = POMCPSolver()
planner = solve(solver, pomdp)

hist = simulate(HistoryRecorder(max_steps=10), pomdp, planner);
println("reward: $(discounted_reward(hist))")

Which function(s) should I implement for my problem / use in my solver?

Problem Writers

Generally, a problem implementer need only implement the simplest one or two of these functions, and the rest are automatically synthesized at runtime.

If there is a convenient way for the problem to generate a combination of states, observations, and rewards simultaneously (for example, if there is a simulator written in another programming language that generates these from the same function, or if it is computationally convenient to generate sp and o simultaneously), then the problem writer may wish to directly implement one of the combination generate_ functions, e.g. generate_sor() directly.

Use the following logic to determine which functions to implement:

If you are implementing the problem from scratch in Julia, implement generate_s and generate_o.
Otherwise, if your external simulator returns x, where x is one of sr, so, or, or sor, implement generate_x. (you may also have to implement generate_s separately for use in particle filters).

Note: if an explicit definition is already implemented, you do not need to implement any functions from the generative interface - POMDPs.jl will automatically generate implementations of them for you at runtime (see generative_impl.jl).

Solver and Simulator Writers

Solver writers should use the single function that generates everything that they need and nothing they don't. For example, if the solver needs access to the state, observation, and reward at every timestep, they should use generate_sor() rather than generate_s() and generate_or(), and if the solver needs access to the state and reward, they should use generate_sr() rather than generate_sor(). This will ensure the widest interoperability between solvers and problems.

In other words, if you need access to x where x is s, o, sr, so, or, or sor at a certain point in your code, use generate_x.

[1] Decision Making Under Uncertainty: Theory and Application by Mykel J. Kochenderfer, MIT Press, 2015