API Documentation

Docstrings for POMDPs.jl interface members can be accessed through Julia's built-in documentation system or in the list below.

Contents

Index

Types

POMDPs.POMDPType
POMDP{S,A,O}

Abstract base type for a partially observable Markov decision process.

S: state type
A: action type
O: observation type
source
POMDPs.MDPType
MDP{S,A}

Abstract base type for a fully observable Markov decision process.

S: state type
A: action type
source
POMDPs.PolicyType

Base type for a policy (a map from every possible belief, or more abstract policy state, to an optimal or suboptimal action)

source
POMDPs.UpdaterType

Abstract type for an object that defines how the belief should be updated

A belief is a general construct that represents the knowledge an agent has about the state of the system. This can be a probability distribution, an action observation history or a more general representation.

source

Model Functions

Dynamics

POMDPs.transitionFunction
transition(m::POMDP, state, action)
transition(m::MDP, state, action)

Return the transition distribution from the current state-action pair.

If it is difficult to define the probability density or mass function explicitly, consider using POMDPModelTools.ImplicitDistribution to define a generative model.

source
POMDPs.observationFunction
observation(m::POMDP, statep)
observation(m::POMDP, action, statep)
observation(m::POMDP, state, action, statep)

Return the observation distribution. You need only define the method with the fewest arguments needed to determine the observation distribution.

If it is difficult to define the probability density or mass function explicitly, consider using POMDPModelTools.ImplicitDistribution to define a generative model.

Example

using POMDPModelTools # for SparseCat

struct MyPOMDP <: POMDP{Int, Int, Int} end

observation(p::MyPOMDP, sp::Int) = SparseCat([sp-1, sp, sp+1], [0.1, 0.8, 0.1])
source
POMDPs.rewardFunction
reward(m::POMDP, s, a)
reward(m::MDP, s, a)

Return the immediate reward for the s-a pair.

reward(m::POMDP, s, a, sp)
reward(m::MDP, s, a, sp)

Return the immediate reward for the s-a-s' triple

reward(m::POMDP, s, a, sp, o)

Return the immediate reward for the s-a-s'-o quad

For some problems, it is easier to express reward(m, s, a, sp) or reward(m, s, a, sp, o), than reward(m, s, a), but some solvers, e.g. SARSOP, can only use reward(m, s, a). Both can be implemented for a problem, but when reward(m, s, a) is implemented, it should be consistent with reward(m, s, a, sp[, o]), that is, it should be the expected value over all destination states and observations.

source
POMDPs.genFunction
gen(m::Union{MDP,POMDP}, s, a, rng::AbstractRNG)

Function for implementing the entire MDP/POMDP generative model by returning a NamedTuple.

gen should only be implemented in the case where two or more of the next state, observation, and reward need to be generated at the same time. If the state transition model can be separated from the reward and observation models, you should implement transition with an ImplicitDistribution instead of gen.

Solver and simulator writers should use the @gen macro to call a generative model.

Arguments

  • m: an MDP or POMDP model
  • s: the current state
  • a: the action
  • rng: a random number generator (Typically a MersenneTwister)

Return

The function should return a NamedTuple. With a subset of following entries:

MDP

  • sp: the next state
  • r: the reward for the step
  • info: extra debugging information, typically in an associative container like a NamedTuple

POMDP

  • sp: the next state
  • o: the observation
  • r: the reward for the step
  • info: extra debugging information, typically in an associative container like a NamedTuple

Some elements can be left out. For instance if o is left out of the return, the problem-writer can also implement observation and POMDPs.jl will automatically use it when needed.

Example

struct LQRMDP <: MDP{Float64, Float64} end

POMDPs.gen(m::LQRMDP, s, a, rng) = (sp = s + a + randn(rng), r = -s^2 - a^2)
source
POMDPs.@genMacro
@gen(X)(m, s, a)
@gen(X)(m, s, a, rng::AbstractRNG)

Call the generative model for a (PO)MDP m; Sample values from several nodes in the dynamic decision network. X is one or more symbols indicating which nodes to output.

Solvers and simulators should call this rather than the gen function. Problem writers should implement a method of the transition or gen function instead of altering @gen.

Arguments

  • m: an MDP or POMDP model
  • s: the current state
  • a: the action
  • rng (optional): a random number generator (Typically a MersenneTwister)

Return

If X, is a symbol, return a value sample from the corresponding node. If X is several symbols, return a Tuple of values sampled from the specified nodes.

Examples

Let m be an MDP or POMDP, s be a state of m, a be an action of m, and rng be an AbstractRNG.

  • @gen(:sp, :r)(m, s, a) returns a Tuple containing the next state and reward.
  • @gen(:sp, :o, :r)(m, s, a, rng) returns a Tuple containing the next state, observation, and reward.
  • @gen(:sp)(m, s, a, rng) returns the next state.
source

Static Properties

POMDPs.statesFunction
states(problem::POMDP)
states(problem::MDP)

Returns the complete state space of a POMDP.

source
POMDPs.actionsFunction
actions(m::Union{MDP,POMDP})

Returns the entire action space of a (PO)MDP.


actions(m::Union{MDP,POMDP}, s)

Return the actions that can be taken from state s.


actions(m::POMDP, b)

Return the actions that can be taken from belief b.

To implement an observation-dependent action space, use currentobs(b) to get the observation associated with belief b within the implementation of actions(m, b).

source
POMDPs.isterminalFunction
isterminal(m::Union{MDP,POMDP}, s)

Check if state s is terminal.

If a state is terminal, no actions will be taken in it and no additional rewards will be accumulated. Thus, the value function at such a state is, by definition, zero.

source
POMDPs.discountFunction
discount(m::POMDP)
discount(m::MDP)

Return the discount factor for the problem.

source
POMDPs.initialstateFunction
initialstate(m::Union{POMDP,MDP})

Return a distribution of initial states for (PO)MDP m.

If it is difficult to define the probability density or mass function explicitly, consider using POMDPModelTools.ImplicitDistribution to define a model for sampling.

source
POMDPs.initialobsFunction
initialobs(m::POMDP, s)

Return a distribution of initial observations for POMDP m and state s.

If it is difficult to define the probability density or mass function explicitly, consider using POMDPModelTools.ImplicitDistribution to define a model for sampling.

This function is only used in cases where the policy expects an initial observation rather than an initial belief, e.g. in a reinforcement learning setting. It is not used in a standard POMDP simulation.

source
POMDPs.stateindexFunction
stateindex(problem::POMDP, s)
stateindex(problem::MDP, s)

Return the integer index of state s. Used for discrete models only.

source
POMDPs.actionindexFunction
actionindex(problem::POMDP, a)
actionindex(problem::MDP, a)

Return the integer index of action a. Used for discrete models only.

source
POMDPs.obsindexFunction
obsindex(problem::POMDP, o)

Return the integer index of observation o. Used for discrete models only.

source
POMDPs.convert_sFunction
convert_s(::Type{V}, s, problem::Union{MDP,POMDP}) where V<:AbstractArray
convert_s(::Type{S}, vec::V, problem::Union{MDP,POMDP}) where {S,V<:AbstractArray}

Convert a state to vectorized form or vice versa.

source
POMDPs.convert_aFunction
convert_a(::Type{V}, a, problem::Union{MDP,POMDP}) where V<:AbstractArray
convert_a(::Type{A}, vec::V, problem::Union{MDP,POMDP}) where {A,V<:AbstractArray}

Convert an action to vectorized form or vice versa.

source
POMDPs.convert_oFunction
convert_o(::Type{V}, o, problem::Union{MDP,POMDP}) where V<:AbstractArray
convert_o(::Type{O}, vec::V, problem::Union{MDP,POMDP}) where {O,V<:AbstractArray}

Convert an observation to vectorized form or vice versa.

source

Type Inference

POMDPs.statetypeFunction
statetype(t::Type)
statetype(p::Union{POMDP,MDP})

Return the state type for a problem type (the S in POMDP{S,A,O}).

type A <: POMDP{Int, Bool, Bool} end

statetype(A) # returns Int
source
POMDPs.actiontypeFunction
actiontype(t::Type)
actiontype(p::Union{POMDP,MDP})

Return the state type for a problem type (the S in POMDP{S,A,O}).

type A <: POMDP{Bool, Int, Bool} end

actiontype(A) # returns Int
source
POMDPs.obstypeFunction
obstype(t::Type)

Return the state type for a problem type (the S in POMDP{S,A,O}).

type A <: POMDP{Bool, Bool, Int} end

obstype(A) # returns Int
source

Distributions and Spaces

Base.randFunction
rand(rng::AbstractRNG, d::Any)

Return a random element from distribution or space d.

If d is a state or transition distribution, the sample will be a state; if d is an action distribution, the sample will be an action or if d is an observation distribution, the sample will be an observation.

source
Distributions.pdfFunction
pdf(d::Any, x::Any)

Evaluate the probability density of distribution d at sample x.

source
Distributions.supportFunction
support(d::Any)

Return an iterable object containing the possible values that can be sampled from distribution d. Values with zero probability may be skipped.

source

Belief Functions

POMDPs.updateFunction
update(updater::Updater, belief_old, action, observation)

Return a new instance of an updated belief given belief_old and the latest action and observation.

source
POMDPs.initialize_beliefFunction
initialize_belief(updater::Updater,
                     state_distribution::Any)
initialize_belief(updater::Updater, belief::Any)

Returns a belief that can be updated using updater that has similar distribution to state_distribution or belief.

The conversion may be lossy. This function is also idempotent, i.e. there is a default implementation that passes the belief through when it is already the correct type: initialize_belief(updater::Updater, belief) = belief

source
POMDPs.historyFunction
history(b)

Return the action-observation history associated with belief b.

The history should be an AbstractVector, Tuple, (or similar object that supports indexing with end) full of NamedTuples with keys :a and :o, i.e. history(b)[end][:a] should be the last action taken leading up to b, and history(b)[end][:o] should be the last observation received.

It is acceptable to return only part of the history if that is all that is available, but it should always end with the current observation. For example, it would be acceptable to return a structure containing only the last three observations in a length 3 Vector{NamedTuple{(:o,),Tuple{O}}.

source
POMDPs.currentobsFunction
currentobs(b)

Return the latest observation associated with belief b.

If a solver or updater implements history(b) for a belief type, currentobs has a default implementation.

source

Policy and Solver Functions

POMDPs.solveFunction
solve(solver::Solver, problem::POMDP)

Solves the POMDP using method associated with solver, and returns a policy.

source
POMDPs.updaterFunction
updater(policy::Policy)

Returns a default Updater appropriate for a belief type that policy p can use

source
POMDPs.actionFunction
action(policy::Policy, x)

Returns the action that the policy deems best for the current state or belief, x.

x is a generalized information state - can be a state in an MDP, a distribution in POMDP, or another specialized policy-dependent representation of the information needed to choose an action.

source
POMDPs.valueFunction
value(p::Policy, s)
value(p::Policy, s, a)

Returns the utility value from policy p given the state (or belief), or state-action (or belief-action) pair.

The state-action version is commonly referred to as the Q-value.

source

Simulator

POMDPs.simulateFunction
simulate(sim::Simulator, m::POMDP, p::Policy, u::Updater=updater(p), b0=initialstate(m), s0=rand(b0))
simulate(sim::Simulator, m::MDP, p::Policy, s0=rand(initialstate(m)))

Run a simulation using the specified policy.

The return type is flexible and depends on the simulator. Simulations should adhere to the Simulation Standard.

source