Discrete Explicit Interface
The Discrete Explicit Interface is designed to match the standard definition of a POMDP in the literature as closely as possible. The standard definition uses the tuple (S,A,O,T,Z,R,γ) for a POMDP and (S,A,T,R,γ) for an MDP, where
- S, A, and O are the state, action, and observation spaces,
- T and Z are the transition and observation probability distribution functions (pdfs),
- R is the reward function, and
- γ is the discount factor.
The DiscreteExplicitPOMDP
and DiscreteExplicitMDP
types are provided for POMDPs and MDPs with discrete spaces and explicitly defined distributions. They should offer moderately good performance on small to medium-sized problems. Instructions for defining the initial distribution and terminal states can be found in the docstrings.
Example
The classic tiger POMDP [Kaelbling et al. 98] can be defined as follows:
using QuickPOMDPs
S = [:left, :right] # S, A, and O may contain any objects
A = [:left, :right, :listen] # including user-defined types
O = [:left, :right]
γ = 0.95
function T(s, a, sp)
if a == :listen
return s == sp
else # a door is opened
return 0.5 #reset
end
end
function Z(a, sp, o)
if a == :listen
if o == sp
return 0.85
else
return 0.15
end
else
return 0.5
end
end
function R(s, a)
if a == :listen
return -1.0
elseif s == a # the tiger was found
return -100.0
else # the tiger was escaped
return 10.0
end
end
m = DiscreteExplicitPOMDP(S,A,O,T,Z,R,γ)
Constructor Documentation
QuickPOMDPs.DiscreteExplicitMDP
— TypeDiscreteExplicitMDP(S,A,T,R,γ,[p₀],[terminals=Set()])
Create an MDP defined by the tuple (S,A,T,R,γ).
Arguments
Required
S
,A
: State and action spaces (typicallyVector
s)T::Function
: Transition probability distribution function; $T(s,a,s')$ is the probability of transitioning to state $s'$ from state $s$ after taking action $a$.R::Function
: Reward function; $R(s,a)$ is the reward for taking action $a$ in state $s$.γ::Float64
: Discount factor.
Optional
p₀=Uniform(S)
: Initial state distribution (SeePOMDPModelTools.Deterministic
andPOMDPModelTools.SparseCat
for other options).
Keyword
terminals=Set()
: Set of terminal states. Once a terminal state is reached, no more actions can be taken or reward received.
QuickPOMDPs.DiscreteExplicitPOMDP
— TypeDiscreteExplicitPOMDP(S,A,O,T,Z,R,γ,[b₀],[terminals=Set()])
Create a POMDP defined by the tuple (S,A,O,T,Z,R,γ).
Arguments
Required
S
,A
,O
: State, action, and observation spaces (typicallyVector
s)T::Function
: Transition probability distribution function; $T(s,a,s')$ is the probability of transitioning to state $s'$ from state $s$ after taking action $a$.Z::Function
: Observation probability distribution function; $O(a, s', o)$ is the probability of receiving observation $o$ when state $s'$ is reached after action $a$.R::Function
: Reward function; $R(s,a)$ is the reward for taking action $a$ in state $s$.γ::Float64
: Discount factor.
Optional
b₀=Uniform(S)
: Initial belief/state distribution (SeePOMDPModelTools.Deterministic
andPOMDPModelTools.SparseCat
for other options).
Keyword
terminals=Set()
: Set of terminal states. Once a terminal state is reached, no more actions can be taken or reward received.
Usage from Python
The Discrete Explicit interface can be used from python via pyjulia. See examples/tiger.py for an example.