Discrete Explicit Interface
The Discrete Explicit Interface is designed to match the standard definition of a POMDP in the literature as closely as possible. The standard definition uses the tuple (S,A,O,T,Z,R,γ) for a POMDP and (S,A,T,R,γ) for an MDP, where
- S, A, and O are the state, action, and observation spaces,
 - T and Z are the transition and observation probability distribution functions (pdfs),
 - R is the reward function, and
 - γ is the discount factor.
 
The DiscreteExplicitPOMDP and DiscreteExplicitMDP types are provided for POMDPs and MDPs with discrete spaces and explicitly defined distributions. They should offer moderately good performance on small to medium-sized problems. Instructions for defining the initial distribution and terminal states can be found in the docstrings.
Example
The classic tiger POMDP [Kaelbling et al. 98] can be defined as follows:
using QuickPOMDPs
S = [:left, :right]           # S, A, and O may contain any objects
A = [:left, :right, :listen]  # including user-defined types
O = [:left, :right]
γ = 0.95
function T(s, a, sp)
    if a == :listen
        return s == sp
    else # a door is opened
        return 0.5 #reset
    end
end
function Z(a, sp, o)
    if a == :listen
        if o == sp
            return 0.85
        else
            return 0.15
        end
    else
        return 0.5
    end
end
function R(s, a)
    if a == :listen  
        return -1.0
    elseif s == a # the tiger was found
        return -100.0
    else # the tiger was escaped
        return 10.0
    end
end
m = DiscreteExplicitPOMDP(S,A,O,T,Z,R,γ)Constructor Documentation
QuickPOMDPs.DiscreteExplicitMDP — TypeDiscreteExplicitMDP(S,A,T,R,γ,[p₀],[terminals=Set()])Create an MDP defined by the tuple (S,A,T,R,γ).
Arguments
Required
S,A: State and action spaces (typicallyVectors)T::Function: Transition probability distribution function; $T(s,a,s')$ is the probability of transitioning to state $s'$ from state $s$ after taking action $a$.R::Function: Reward function; $R(s,a)$ is the reward for taking action $a$ in state $s$.γ::Float64: Discount factor.
Optional
p₀=Uniform(S): Initial state distribution (SeePOMDPModelTools.DeterministicandPOMDPModelTools.SparseCatfor other options).
Keyword
terminals=Set(): Set of terminal states. Once a terminal state is reached, no more actions can be taken or reward received.
QuickPOMDPs.DiscreteExplicitPOMDP — TypeDiscreteExplicitPOMDP(S,A,O,T,Z,R,γ,[b₀],[terminals=Set()])Create a POMDP defined by the tuple (S,A,O,T,Z,R,γ).
Arguments
Required
S,A,O: State, action, and observation spaces (typicallyVectors)T::Function: Transition probability distribution function; $T(s,a,s')$ is the probability of transitioning to state $s'$ from state $s$ after taking action $a$.Z::Function: Observation probability distribution function; $O(a, s', o)$ is the probability of receiving observation $o$ when state $s'$ is reached after action $a$.R::Function: Reward function; $R(s,a)$ is the reward for taking action $a$ in state $s$.γ::Float64: Discount factor.
Optional
b₀=Uniform(S): Initial belief/state distribution (SeePOMDPModelTools.DeterministicandPOMDPModelTools.SparseCatfor other options).
Keyword
terminals=Set(): Set of terminal states. Once a terminal state is reached, no more actions can be taken or reward received.
Usage from Python
The Discrete Explicit interface can be used from python via pyjulia. See examples/tiger.py for an example.