API Documentation
Documentation for the POMDPs.jl
user interface. You can get help for any type or function in the module by typing ?
in the Julia REPL followed by the name of type or function. For example:
julia> using POMDPs
julia> ?
help?> reward
search: reward
reward{S,A,O}(pomdp::POMDP{S,A,O}, state::S, action::A, statep::S)
Returns the immediate reward for the s-a-s triple
reward{S,A,O}(pomdp::POMDP{S,A,O}, state::S, action::A)
Returns the immediate reward for the s-a pair
Contents
Index
POMDPs.ConstantDDNNode
POMDPs.DDNNode
POMDPs.DDNOut
POMDPs.DDNStructure
POMDPs.DistributionDDNNode
POMDPs.FunctionDDNNode
POMDPs.GenericDDNNode
POMDPs.MDP
POMDPs.POMDP
POMDPs.Policy
POMDPs.Simulator
POMDPs.Solver
POMDPs.Updater
Base.rand
Distributions.pdf
Distributions.support
POMDPs.action
POMDPs.actionindex
POMDPs.actions
POMDPs.actiontype
POMDPs.add_registry
POMDPs.available
POMDPs.check_requirements
POMDPs.convert_a
POMDPs.convert_o
POMDPs.convert_s
POMDPs.currentobs
POMDPs.dimensions
POMDPs.discount
POMDPs.gen
POMDPs.get_requirements
POMDPs.history
POMDPs.implemented
POMDPs.initialize_belief
POMDPs.initialobs
POMDPs.initialstate
POMDPs.initialstate_distribution
POMDPs.isterminal
POMDPs.observation
POMDPs.observations
POMDPs.obsindex
POMDPs.obstype
POMDPs.requirements_info
POMDPs.reward
POMDPs.show_requirements
POMDPs.simulate
POMDPs.solve
POMDPs.stateindex
POMDPs.states
POMDPs.statetype
POMDPs.transition
POMDPs.update
POMDPs.updater
POMDPs.value
Statistics.mean
StatsBase.mode
POMDPs.@POMDP_require
POMDPs.@POMDP_requirements
POMDPs.@gen
POMDPs.@get_requirements
POMDPs.@req
POMDPs.@requirements_info
POMDPs.@show_requirements
POMDPs.@subreq
POMDPs.@warn_requirements
Types
POMDPs.POMDP
— TypePOMDP{S,A,O}
Abstract base type for a partially observable Markov decision process.
S: state type
A: action type
O: observation type
POMDPs.MDP
— TypeMDP{S,A}
Abstract base type for a fully observable Markov decision process.
S: state type
A: action type
POMDPs.Solver
— TypeBase type for an MDP/POMDP solver
POMDPs.Policy
— TypeBase type for a policy (a map from every possible belief, or more abstract policy state, to an optimal or suboptimal action)
POMDPs.Updater
— TypeAbstract type for an object that defines how the belief should be updated
A belief is a general construct that represents the knowledge an agent has about the state of the system. This can be a probability distribution, an action observation history or a more general representation.
Model Functions
Explicit
These functions return distributions.
POMDPs.transition
— Functiontransition(problem::POMDP, state, action)
transition(problem::MDP, state, action)
Return the transition distribution from the current state-action pair
POMDPs.observation
— Functionobservation(problem::POMDP, statep)
observation(problem::POMDP, action, statep)
observation(problem::POMDP, state, action, statep)
Return the observation distribution. You need only define the method with the fewest arguments needed to determine the observation distribution.
Example
using POMDPModelTools # for SparseCat
struct MyPOMDP <: POMDP{Int, Int, Int} end
observation(p::MyPOMDP, sp::Int) = SparseCat([sp-1, sp, sp+1], [0.1, 0.8, 0.1])
POMDPs.initialstate_distribution
— Functioninitialstate_distribution(pomdp::POMDP)
initialstate_distribution(mdp::MDP)
Return a distribution of the initial state of the pomdp or mdp.
POMDPs.reward
— Functionreward(m::POMDP, s, a)
reward(m::MDP, s, a)
Return the immediate reward for the s-a pair.
reward(m::POMDP, s, a, sp)
reward(m::MDP, s, a, sp)
Return the immediate reward for the s-a-s' triple
reward(m::POMDP, s, a, sp, o)
Return the immediate reward for the s-a-s'-o quad
For some problems, it is easier to express reward(m, s, a, sp)
or reward(m, s, a, sp, o)
, than reward(m, s, a)
, but some solvers, e.g. SARSOP, can only use reward(m, s, a)
. Both can be implemented for a problem, but when reward(m, s, a)
is implemented, it should be consistent with reward(m, s, a, sp[, o])
, that is, it should be the expected value over all destination states and observations.
Generative
These functions should return states, observations, and/or rewards.
gen
in POMDPs.jl v0.8 corresponds to the generate_
functions in previous versions
POMDPs.@gen
— Macro@gen(X)(m, s, a, rng)
Call the generative model for a (PO)MDP m
; Sample values from several nodes in the dynamic decision network. X is one or more symbols indicating which nodes to output.
Solvers and simulators should usually call this rather than the gen
function. Problem writers should implement methods of the gen
function.
Arguments
m
: anMDP
orPOMDP
models
: the current statea
: the actionrng
: a random number generator (Typically aMersenneTwister
)
Return
If X
, is a symbol, return a value sample from the corresponding node. If X
is several symbols, return a Tuple
of values sampled from the specified nodes.
Examples
Let m
be an MDP
or POMDP
, s
be a state of m
, a
be an action of m
, and rng
be an AbstractRNG
.
@gen(:sp, :r)(m, s, a, rng)
returns aTuple
containing the next state and reward.@gen(:sp, :o, :r)(m, s, a, rng)
returns aTuple
containing the next state, observation, and reward.@gen(:sp)(m, s, a, rng)
returns the next state.
POMDPs.gen
— Functiongen(...)
Sample from generative model of a POMDP or MDP.
In most cases solver and simulator writers should use the @gen
macro. Problem writers may wish to implement one or more new methods of the function for their problem.
There are three versions of the function:
- The most convenient version to implement is gen(m::Union{MDP,POMDP}, s, a, rng::AbstractRNG), which returns a
NamedTuple
. - Defining behavior for and sampling from individual nodes of the dynamic decision network can be accomplished using the version with a
DDNNode
argument. - A version with a
DDNOut
argument is provided by the compiler to sample multiple nodes at once.
See below for detailed documentation for each type.
gen(m::Union{MDP,POMDP}, s, a, rng::AbstractRNG)
Convenience function for implementing the entire MDP/POMDP generative model in one function by returning a NamedTuple
.
The NamedTuple
version of gen
is the most convenient for problem writers to implement. However, it should never be used directly by solvers or simulators. Instead solvers and simulators should use the version with a DDNOut
first argument.
Arguments
m
: anMDP
orPOMDP
models
: the current statea
: the actionrng
: a random number generator (Typically aMersenneTwister
)
Return
The function should return a NamedTuple
. Typically, this NamedTuple
will be (sp=<next state>, r=<reward>)
for an MDP
or (sp=<next state>, o=<observation>, r=<reward>) for a
POMDP`.
gen(v::DDNNode{name}, m::Union{MDP,POMDP}, depargs..., rng::AbstractRNG)
Sample a value from a node in the dynamic decision network.
These functions will be used within gen(::DDNOut, ...) to sample values for all outputs and their dependencies. They may be implemented directly by a problem-writer if they wish to implement a generative model for a particular node in the dynamic decision network, and may be called in solvers to sample a value for a particular node.
Arguments
v::DDNNode{name}
: which DDN node the function should sample from.depargs
: values for all the dependent nodes. Dependencies are determined bydeps(DDNStructure(m), name)
.rng
: a random number generator (Typically aMersenneTwister
)
Return
A sampled value from the specified node.
Examples
Let m
be a POMDP
, s
and sp
be states of m
, a
be an action of m
, and rng
be an AbstractRNG
.
gen(DDNNode(:sp), m, s, a, rng)
returns the next state.gen(DDNNode(:o), m, s, a, sp, rng)
returns the observation given the previous state, action, and new state.
gen(t::DDNOut{X}, m::Union{MDP,POMDP}, s, a, rng::AbstractRNG) where X
Sample values from several nodes in the dynamic decision network. X is a symbol or tuple of symbols indicating which nodes to output.
An implementation of this method is automatically provided by POMDPs.jl. Solvers and simulators should use this version. Problem writers may implement it directly in special cases (see the POMDPs.jl documentation for more information).
Arguments
t::DDNOut
: which DDN nodes the function should sample from.m
: anMDP
orPOMDP
models
: the current statea
: the actionrng
: a random number generator (Typically aMersenneTwister
)
Return
If the DDNOut
parameter, X
, is a symbol, return a value sample from the corresponding node. If X
is a tuple of symbols, return a Tuple
of values sampled from the specified nodes.
Examples
Let m
be an MDP
or POMDP
, s
be a state of m
, a
be an action of m
, and rng
be an AbstractRNG
.
gen(DDNOut(:sp, :r), m, s, a, rng)
returns aTuple
containing the next state and reward.gen(DDNOut(:sp, :o, :r), m, s, a, rng)
returns aTuple
containing the next state, observation, and reward.gen(DDNOut(:sp), m, s, a, rng)
returns the next state.
POMDPs.initialstate
— Functioninitialstate(m::Union{POMDP,MDP}, rng::AbstractRNG)
Return a sampled initial state for the problem m
.
Usually the initial state is sampled from an initial state distribution. The random number generator rng
should be used to draw this sample (e.g. use rand(rng)
instead of rand()
).
POMDPs.initialobs
— Functioninitialobs(m::POMDP, s, rng::AbstractRNG)
Return a sampled initial observation for the problem m
and state s
.
This function is only used in cases where the policy expects an initial observation rather than an initial belief, e.g. in a reinforcement learning setting. It is not used in a standard POMDP simulation.
By default, it will fall back to observation(m, s)
. The random number generator rng
should be used to draw this sample (e.g. use rand(rng)
instead of rand()
).
Common
POMDPs.states
— Functionstates(problem::POMDP)
states(problem::MDP)
Returns the complete state space of a POMDP.
POMDPs.actions
— Functionactions(m::Union{MDP,POMDP})
Returns the entire action space of a (PO)MDP.
actions(m::Union{MDP,POMDP}, s)
Return the actions that can be taken from state s
.
actions(m::POMDP, b)
Return the actions that can be taken from belief b
.
To implement an observation-dependent action space, use currentobs(b)
to get the observation associated with belief b
within the implementation of actions(m, b)
.
POMDPs.observations
— Functionobservations(problem::POMDP)
Return the entire observation space.
POMDPs.isterminal
— Functionisterminal(m::Union{MDP,POMDP}, s)
Check if state s
is terminal.
If a state is terminal, no actions will be taken in it and no additional rewards will be accumulated. Thus, the value at such a state is, by definition, zero.
POMDPs.discount
— Functiondiscount(problem::POMDP)
discount(problem::MDP)
Return the discount factor for the problem.
POMDPs.stateindex
— Functionstateindex(problem::POMDP, s)
stateindex(problem::MDP, s)
Return the integer index of state s
. Used for discrete models only.
POMDPs.actionindex
— Functionactionindex(problem::POMDP, a)
actionindex(problem::MDP, a)
Return the integer index of action a
. Used for discrete models only.
POMDPs.obsindex
— Functionobsindex(problem::POMDP, o)
Return the integer index of observation o
. Used for discrete models only.
POMDPs.convert_s
— Functionconvert_s(::Type{V}, s, problem::Union{MDP,POMDP}) where V<:AbstractArray
convert_s(::Type{S}, vec::V, problem::Union{MDP,POMDP}) where {S,V<:AbstractArray}
Convert a state to vectorized form or vice versa.
POMDPs.convert_a
— Functionconvert_a(::Type{V}, a, problem::Union{MDP,POMDP}) where V<:AbstractArray
convert_a(::Type{A}, vec::V, problem::Union{MDP,POMDP}) where {A,V<:AbstractArray}
Convert an action to vectorized form or vice versa.
POMDPs.convert_o
— Functionconvert_o(::Type{V}, o, problem::Union{MDP,POMDP}) where V<:AbstractArray
convert_o(::Type{O}, vec::V, problem::Union{MDP,POMDP}) where {O,V<:AbstractArray}
Convert an observation to vectorized form or vice versa.
Distribution/Space Functions
Base.rand
— Functionrand(rng::AbstractRNG, d::Any)
Return a random element from distribution or space d
.
If d
is a state or transition distribution, the sample will be a state; if d
is an action distribution, the sample will be an action or if d
is an observation distribution, the sample will be an observation.
Distributions.pdf
— Functionpdf(d::Any, x::Any)
Evaluate the probability density of distribution d
at sample x
.
StatsBase.mode
— Functionmode(d::Any)
Return the most likely value in a distribution d.
Statistics.mean
— Functionmean(d::Any)
Return the mean of a distribution d.
POMDPs.dimensions
— Functiondimensions(s::Any)
Returns the number of dimensions in space s
.
Distributions.support
— Functionsupport(d::Any)
Return an iterable object containing the possible values that can be sampled from distribution d. Values with zero probability may be skipped.
Dynamic decision networks
POMDPs.DDNStructure
— TypeDDNStructure(::Type{M}) where M <: Union{MDP, POMDP}
Trait of an MDP/POMDP type for describing the structure of the dynamic Baysian network.
Example
struct MyMDP <: MDP{Int, Int} end
POMDPs.gen(::MyMDP, s, a, rng) = (sp=s+a+rand(rng, [1,2,3]), r=s^2)
# make a new node, delta_s, that is deterministically equal to sp - s
function POMDPs.DDNStructure(::Type{MyMDP})
ddn = mdp_ddn()
return add_node(ddn, :delta_s, FunctionDDNNode((m,s,sp)->sp-s), (:s, :sp))
end
gen(DDNOut(:delta_s), MyMDP(), 1, 1, Random.GLOBAL_RNG)
POMDPs.DDNNode
— TypeDDNNode(x::Symbol)
DDNNode{x::Symbol}()
Reference to a named node in the POMDP or MDP dynamic decision network (DDN).
Note that gen(::DDNNode, m, depargs..., rng)
always takes an argument for each dependency whereas gen(::DDNOut, m, s, a, rng)
only takes s
and a
arguments (the inputs to the entire DDN).
DDNNode
is a "value type". See the documentation of Val
for more conceptual details about value types.
POMDPs.DDNOut
— TypeDDNOut(x::Symbol)
DDNOut{x::Symbol}()
DDNOut(::Symbol, ::Symbol,...)
DDNOut{x::NTuple{N, Symbol}}()
Reference to one or more named nodes in the POMDP or MDP dynamic decision network (DDN).
Note that gen(::DDNOut, m, s, a, rng)
always takes s
and a
arguments (the inputs to the entire DDN) while gen(::DDNNode, m, depargs..., rng)
takes a variable number of arguments (one for each dependency).
DDNOut
is a "value type". See the documentation of Val
for more conceptual details about value types.
POMDPs.DistributionDDNNode
— TypeDDN node defined by a function that maps the model and values from the parent nodes to a distribution
Example
DistributionDDNNode((m, s, a)->POMDPModelTools.Deterministic(s+a))
POMDPs.FunctionDDNNode
— TypeDDN node defined by a function that determinisitically maps the model and values from the parent nodes to a new value.
Example
FunctionDDNNode((m, s, a)->s+a)
POMDPs.ConstantDDNNode
— TypeDDN node that always takes a deterministic constant value.
POMDPs.GenericDDNNode
— TypeDDN node that can only have a generative model; gen(::DDNNode{:x}, ...)
must be implemented for a node of this type.
Belief Functions
POMDPs.update
— Functionupdate(updater::Updater, belief_old, action, observation)
Return a new instance of an updated belief given belief_old
and the latest action and observation.
POMDPs.initialize_belief
— Functioninitialize_belief(updater::Updater,
state_distribution::Any)
initialize_belief(updater::Updater, belief::Any)
Returns a belief that can be updated using updater
that has similar distribution to state_distribution
or belief
.
The conversion may be lossy. This function is also idempotent, i.e. there is a default implementation that passes the belief through when it is already the correct type: initialize_belief(updater::Updater, belief) = belief
POMDPs.history
— Functionhistory(b)
Return the action-observation history associated with belief b
.
The history should be an AbstractVector
, Tuple
, (or similar object that supports indexing with end
) full of NamedTuples
with keys :a
and :o
, i.e. history(b)[end][:a]
should be the last action taken leading up to b
, and history(b)[end][:o]
should be the last observation received.
It is acceptable to return only part of the history if that is all that is available, but it should always end with the current observation. For example, it would be acceptable to return a structure containing only the last three observations in a length 3 Vector{NamedTuple{(:o,),Tuple{O}}
.
POMDPs.currentobs
— Functioncurrentobs(b)
Return the latest observation associated with belief b
.
If a solver or updater implements history(b)
for a belief type, currentobs
has a default implementation.
Policy and Solver Functions
POMDPs.solve
— Functionsolve(solver::Solver, problem::POMDP)
Solves the POMDP using method associated with solver, and returns a policy.
POMDPs.updater
— Functionupdater(policy::Policy)
Returns a default Updater appropriate for a belief type that policy p
can use
POMDPs.action
— Functionaction(policy::Policy, x)
Returns the action that the policy deems best for the current state or belief, x
.
x
is a generalized information state - can be a state in an MDP, a distribution in POMDP, or another specialized policy-dependent representation of the information needed to choose an action.
POMDPs.value
— Functionvalue(p::Policy, s)
value(p::Policy, s, a)
Returns the utility value from policy p
given the state (or belief), or state-action (or belief-action) pair.
The state-action version is commonly referred to as the Q-value.
Simulator
POMDPs.Simulator
— TypeBase type for an object defining how simulations should be carried out.
POMDPs.simulate
— Functionsimulate(sim::Simulator, m::POMDP, p::Policy, u::Updater=updater(p), b0=initialstate_distribution(m), s0=initialstate(m, rng))
simulate(sim::Simulator, m::MDP, p::Policy, s0=initialstate(m, rng))
Run a simulation using the specified policy.
The return type is flexible and depends on the simulator. Simulations should adhere to the Simulation Standard.
Other
The following functions are not part of the API for specifying and solving POMDPs, but are included in the package.
Type Inference
POMDPs.statetype
— Functionstatetype(t::Type)
statetype(p::Union{POMDP,MDP})
Return the state type for a problem type (the S
in POMDP{S,A,O}
).
type A <: POMDP{Int, Bool, Bool} end
statetype(A) # returns Int
POMDPs.actiontype
— Functionactiontype(t::Type)
actiontype(p::Union{POMDP,MDP})
Return the state type for a problem type (the S
in POMDP{S,A,O}
).
type A <: POMDP{Bool, Int, Bool} end
actiontype(A) # returns Int
POMDPs.obstype
— Functionobstype(t::Type)
Return the state type for a problem type (the S
in POMDP{S,A,O}
).
type A <: POMDP{Bool, Bool, Int} end
obstype(A) # returns Int
Requirements Specification
POMDPs.check_requirements
— Functioncheck_requirements(r::AbstractRequirementSet)
Check whether the methods in r
have implementations with implemented()
. Return true if all methods have implementations.
POMDPs.show_requirements
— Functionshow_requirements(r::AbstractRequirementSet)
Check whether the methods in r
have implementations with implemented()
and print out a formatted list showing which are missing. Return true if all methods have implementations.
POMDPs.get_requirements
— Functionget_requirements(f::Function, args::Tuple)
Return a RequirementSet for the function f and arguments args.
POMDPs.requirements_info
— Functionrequirements_info(s::Solver, p::Union{POMDP,MDP}, ...)
Print information about the requirement for solver s.
POMDPs.@POMDP_require
— Macro@POMDP_require solve(s::CoolSolver, p::POMDP) begin
PType = typeof(p)
@req states(::PType)
@req actions(::PType)
@req transition(::PType, ::S, ::A)
s = first(states(p))
a = first(actions(p))
t_dist = transition(p, s, a)
@req rand(::AbstractRNG, ::typeof(t_dist))
end
Create a get_requirements implementation for the function signature and the requirements block.
POMDPs.@POMDP_requirements
— Macroreqs = @POMDP_requirements CoolSolver begin
PType = typeof(p)
@req states(::PType)
@req actions(::PType)
@req transition(::PType, ::S, ::A)
s = first(states(p))
a = first(actions(p))
t_dist = transition(p, s, a)
@req rand(::AbstractRNG, ::typeof(t_dist))
end
Create a RequirementSet object.
POMDPs.@requirements_info
— Macro@requirements_info ASolver() [YourPOMDP()]
Print information about the requirements for a solver.
POMDPs.@get_requirements
— Macro@get_requirements f(arg1, arg2)
Call get_requirements(f, (arg1,arg2)).
POMDPs.@show_requirements
— Macro@show_requirements solve(solver, problem)
Print a a list of requirements for a function call.
POMDPs.@warn_requirements
— Macro@warn_requirements solve(solver, problem)
Print a warning if there are unmet requirements.
POMDPs.@req
— Macro@req f( ::T1, ::T2)
Convert a f( ::T1, ::T2)
expression to a (f, Tuple{T1,T2})::Req
for pushing to a RequirementSet
.
If in a @POMDP_requirements
or @POMDP_require
block, marks the requirement for including in the set of requirements.
POMDPs.@subreq
— Macro@subreq f(arg1, arg2)
In a @POMDP_requirements
or @POMDP_require
block, include the requirements for f(arg1, arg2)
as a child argument set.
POMDPs.implemented
— Functionimplemented(function, Tuple{Arg1Type, Arg2Type})
Check whether there is an implementation available that will return a suitable value.
Utility Tools
POMDPs.add_registry
— Functionadd_registry()
Adds the JuliaPOMDP registry
POMDPs.available
— Functionavailable()
Prints all the available packages in the JuliaPOMDP registry