Discrete Explicit Interface

The Discrete Explicit Interface is designed to match the standard definition of a POMDP in the literature as closely as possible. The standard definition uses the tuple (S,A,O,T,Z,R,γ) for a POMDP and (S,A,T,R,γ) for an MDP, where

S, A, and O are the state, action, and observation spaces,
T and Z are the transition and observation probability distribution functions (pdfs),
R is the reward function, and
γ is the discount factor.

The DiscreteExplicitPOMDP and DiscreteExplicitMDP types are provided for POMDPs and MDPs with discrete spaces and explicitly defined distributions. They should offer moderately good performance on small to medium-sized problems. Instructions for defining the initial distribution and terminal states can be found in the docstrings.

Example

The classic tiger POMDP [Kaelbling et al. 98] can be defined as follows:

using QuickPOMDPs

S = [:left, :right]           # S, A, and O may contain any objects
A = [:left, :right, :listen]  # including user-defined types
O = [:left, :right]
γ = 0.95

function T(s, a, sp)
    if a == :listen
        return s == sp
    else # a door is opened
        return 0.5 #reset
    end
end

function Z(a, sp, o)
    if a == :listen
        if o == sp
            return 0.85
        else
            return 0.15
        end
    else
        return 0.5
    end
end

function R(s, a)
    if a == :listen  
        return -1.0
    elseif s == a # the tiger was found
        return -100.0
    else # the tiger was escaped
        return 10.0
    end
end

m = DiscreteExplicitPOMDP(S,A,O,T,Z,R,γ)

Constructor Documentation

QuickPOMDPs.DiscreteExplicitMDP — Type

DiscreteExplicitMDP(S,A,T,R,γ,[p₀],[terminals=Set()])

Create an MDP defined by the tuple (S,A,T,R,γ).

Arguments

Required

S,A: State and action spaces (typically Vectors)
T::Function: Transition probability distribution function; $T(s,a,s')$ is the probability of transitioning to state $s'$ from state $s$ after taking action $a$.
R::Function: Reward function; $R(s,a)$ is the reward for taking action $a$ in state $s$.
γ::Float64: Discount factor.

Optional

p₀=Uniform(S): Initial state distribution (See POMDPModelTools.Deterministic and POMDPModelTools.SparseCat for other options).

Keyword

terminals=Set(): Set of terminal states. Once a terminal state is reached, no more actions can be taken or reward received.

source

QuickPOMDPs.DiscreteExplicitPOMDP — Type

DiscreteExplicitPOMDP(S,A,O,T,Z,R,γ,[b₀],[terminals=Set()])

Create a POMDP defined by the tuple (S,A,O,T,Z,R,γ).

Arguments

Required

S,A,O: State, action, and observation spaces (typically Vectors)
T::Function: Transition probability distribution function; $T(s,a,s')$ is the probability of transitioning to state $s'$ from state $s$ after taking action $a$.
Z::Function: Observation probability distribution function; $O(a, s', o)$ is the probability of receiving observation $o$ when state $s'$ is reached after action $a$.
R::Function: Reward function; $R(s,a)$ is the reward for taking action $a$ in state $s$.
γ::Float64: Discount factor.

Optional

b₀=Uniform(S): Initial belief/state distribution (See POMDPModelTools.Deterministic and POMDPModelTools.SparseCat for other options).

Keyword

terminals=Set(): Set of terminal states. Once a terminal state is reached, no more actions can be taken or reward received.

source

Usage from Python

The Discrete Explicit interface can be used from python via pyjulia. See examples/tiger.py for an example.