Model Tools

POMDPTools contains assorted tools that are not part of the core POMDPs.jl interface for working with (PO)MDP Models.

Interface Extensions

POMDPTools contains several interface extensions that provide shortcuts and standardized ways of dealing with extra data.

Programmers should use these functions whenever possible in case optimized implementations are available, but all of the functions have default implementations based on the core POMDPs.jl interface. Thus, if the core interface is implemented, all of these functions will also be available.

Weighted Iteration

Many solution techniques, for example value iteration, require iteration through the support of a distribution and evaluating the probability mass for each value. In some cases, looking up the probability mass is expensive, so it is more efficient to iterate through value => probability pairs. weighted_iterator provides a standard interface for this.

POMDPTools.POMDPDistributions.weighted_iterator — Function

weighted_iterator(d)

Return an iterator through pairs of the values and probabilities in distribution d.

This is designed to speed up value iteration. Distributions are encouraged to provide a custom optimized implementation if possible.

Example

julia> d = BoolDistribution(0.7)
BoolDistribution(0.7)

julia> collect(weighted_iterator(d))
2-element Array{Pair{Bool,Float64},1}:
  true => 0.7
 false => 0.3

Observation Weight

Sometimes, e.g. in particle filtering, the relative likelihood of an observation is required in addition to a generative model, and it is often tedious to implement a custom observation distribution type. For this case, the shortcut function obs_weight is provided.

POMDPTools.ModelTools.obs_weight — Function

obs_weight(pomdp, s, a, sp, o)

Return a weight proportional to the likelihood of receiving observation o from state sp (and a and s if they are present).

This is a useful shortcut for particle filtering so that the observation distribution does not have to be represented.

Ordered Spaces

It is often useful to have a list of states, actions, or observations ordered consistently with the respective index function from POMDPs.jl. Since the POMDPs.jl interface does not demand that spaces be ordered consistently with index, the states, actions, and observations functions are not sufficient. Thus POMDPModelTools provides ordered_actions, ordered_states, and ordered_observations to provide this capability.

POMDPTools.ModelTools.ordered_actions — Function

ordered_actions(mdp)

Return an AbstractVector of actions ordered according to actionindex(mdp, a).

ordered_actions(mdp) will always return an AbstractVector{A} v containing all of the actions in actions(mdp) in the order such that actionindex(mdp, v[i]) == i. You may wish to override this for your problem for efficiency.

POMDPTools.ModelTools.ordered_states — Function

ordered_states(mdp)

Return an AbstractVector of states ordered according to stateindex(mdp, a).

ordered_states(mdp) will always return a AbstractVector{A} v containing all of the states in states(mdp) in the order such that stateindex(mdp, v[i]) == i. You may wish to override this for your problem for efficiency.

POMDPTools.ModelTools.ordered_observations — Function

ordered_observations(pomdp)

Return an AbstractVector of observations ordered according to obsindex(pomdp, a).

ordered_observations(mdp) will always return a AbstractVector{A} v containing all of the observations in observations(pomdp) in the order such that obsindex(pomdp, v[i]) == i. You may wish to override this for your problem for efficiency.

Info Interface

It is often the case that useful information besides the belief, state, action, etc is generated by a function in POMDPs.jl. This information can be useful for debugging or understanding the behavior of a solver, updater, or problem. The info interface provides a standard way for problems, policies, solvers or updaters to output this information. The recording simulators from POMDPTools automatically record this information.

To specify info from policies, solvers, or updaters, implement the following functions:

POMDPTools.ModelTools.action_info — Function

a, ai = action_info(policy, x)

Return a tuple containing the action determined by policy 'p' at state or belief 'x' and information (usually a NamedTuple, Dict or nothing) from the calculation of that action.

By default, returns nothing as info.

POMDPTools.ModelTools.solve_info — Function

policy, si = solve_info(solver, problem)

Return a tuple containing the policy determined by a solver and information (usually a NamedTuple, Dict or nothing) from the calculation of that policy.

By default, returns nothing as info.

POMDPTools.ModelTools.update_info — Function

bp, i = update_info(updater, b, a, o)

Return a tuple containing the new belief and information (usually a NamedTuple, Dict or nothing) from the belief update.

By default, returns nothing as info.

Model Transformations

POMDPTools contains several tools for transforming problems into other classes so that they can be used by different solvers.

Linear Algebra Representations

For some algorithms, such as value iteration, it is convenient to use vectors that contain the reward for every state, and matrices that contain the transition probabilities. These can be constructed with the following functions:

POMDPTools.ModelTools.transition_matrices — Function

transition_matrices(p::SparseTabularProblem)

Accessor function for the transition model of a sparse tabular problem. It returns a list of sparse matrices for each action of the problem.

transition_matrices(m::Union{MDP,POMDP})
transition_matrices(m; sparse=true)

Construct transition matrices for (PO)MDP m.

The returned object is an associative object (usually a Dict), where the keys are actions. Each value in this object is an AbstractMatrix where the row corresponds to the state index of s and the column corresponds to the state index of s'. The entry in the matrix is the probability of transitioning from state s to state s'.

POMDPTools.ModelTools.reward_vectors — Function

reward_vectors(m::Union{MDP, POMDP})

Construct reward vectors for (PO)MDP m.

The returned object is an associative object (usually a Dict), where the keys are actions. Each value in this object is an AbstractVector where the index corresponds to the state index of s and the entry is the reward for that state.

Sparse Tabular MDPs and POMDPs

The SparseTabularMDP and SparseTabularPOMDP represents discrete problems defined using the explicit interface. The transition and observation models are represented using sparse matrices. Solver writers can leverage these data structures to write efficient vectorized code. A problem writer can define its problem using the explicit interface and it can be automatically converted to a sparse tabular representation by calling the constructors SparseTabularMDP(::MDP) or SparseTabularPOMDP(::POMDP). See the following docs to know more about the matrix representation and how to access the fields of the SparseTabular objects:

POMDPTools.ModelTools.SparseTabularMDP — Type

SparseTabularMDP

An MDP object where states and actions are integers and the transition is represented by a list of sparse matrices. This data structure can be useful to exploit in vectorized algorithm (e.g. see SparseValueIterationSolver). The recommended way to access the transition and reward matrices is through the provided accessor functions: transition_matrix and reward_vector.

Fields

T::Vector{SparseMatrixCSC{Float64, Int64}} The transition model is represented as a vector of sparse matrices (one for each action). T[a][s, sp] the probability of transition from s to sp taking action a.
R::Array{Float64, 2} The reward is represented as a matrix where the rows are states and the columns actions: R[s, a] is the reward of taking action a in sate s.
initial_probs::SparseVector{Float64, Int64} Specifies the initial state distribution
terminal_states::Set{Int64} Stores the terminal states
discount::Float64 The discount factor

Constructors

SparseTabularMDP(mdp::MDP) : One can provide the matrices to the default constructor or one can construct a SparseTabularMDP from any discrete state MDP defined using the explicit interface.

Note that constructing the transition and reward matrices requires to iterate over all the states and can take a while. To learn more information about how to define an MDP with the explicit interface please visit https://juliapomdp.github.io/POMDPs.jl/latest/explicit/ .

SparseTabularMDP(smdp::SparseTabularMDP; transition, reward, discount) : This constructor returns a new sparse MDP that is a copy of the original smdp except for the field specified by the keyword arguments.

POMDPTools.ModelTools.SparseTabularPOMDP — Type

SparseTabularPOMDP

A POMDP object where states and actions are integers and the transition and observation distributions are represented by lists of sparse matrices. This data structure can be useful to exploit in vectorized algorithms to gain performance (e.g. see SparseValueIterationSolver). The recommended way to access the transition, reward, and observation matrices is through the provided accessor functions: transition_matrix, reward_vector, observation_matrix.

Fields

T::Vector{SparseMatrixCSC{Float64, Int64}} The transition model is represented as a vector of sparse matrices (one for each action). T[a][s, sp] the probability of transition from s to sp taking action a.
R::Array{Float64, 2} The reward is represented as a matrix where the rows are states and the columns actions: R[s, a] is the reward of taking action a in sate s.
O::Vector{SparseMatrixCSC{Float64, Int64}} The observation model is represented as a vector of sparse matrices (one for each action). O[a][sp, o] is the probability of observing o from state sp after having taken action a.
initial_probs::SparseVector{Float64, Int64} Specifies the initial state distribution
terminal_states::Set{Int64} Stores the terminal states
discount::Float64 The discount factor

Constructors

SparseTabularPOMDP(pomdp::POMDP) : One can provide the matrices to the default constructor or one can construct a SparseTabularPOMDP from any discrete state MDP defined using the explicit interface.

Note that constructing the transition and reward matrices requires to iterate over all the states and can take a while. To learn more information about how to define an MDP with the explicit interface please visit https://juliapomdp.github.io/POMDPs.jl/latest/explicit/ .

SparseTabularPOMDP(spomdp::SparseTabularMDP; transition, reward, observation, discount) : This constructor returns a new sparse POMDP that is a copy of the original smdp except for the field specified by the keyword arguments.

POMDPTools.ModelTools.transition_matrix — Function

transition_matrix(p::SparseTabularProblem, a)

Accessor function for the transition model of a sparse tabular problem. It returns a sparse matrix containing the transition probabilities when taking action a: T[s, sp] = Pr(sp | s, a).

POMDPTools.ModelTools.reward_vector — Function

reward_vector(p::SparseTabularProblem, a)

Accessor function for the reward function of a sparse tabular problem. It returns a vector containing the reward for all the states when taking action a: R(s, a). The length of the return vector is equal to the number of states.

POMDPTools.ModelTools.observation_matrix — Function

observation_matrix(p::SparseTabularPOMDP, a::Int64)

Accessor function for the observation model of a sparse tabular POMDP. It returns a sparse matrix containing the observation probabilities when having taken action a: O[sp, o] = Pr(o | sp, a).

POMDPTools.ModelTools.reward_matrix — Function

reward_matrix(p::SparseTabularProblem)

Accessor function for the reward matrix R[s, a] of a sparse tabular problem.

POMDPTools.ModelTools.observation_matrices — Function

observation_matrices(p::SparseTabularPOMDP)

Accessor function for the observation model of a sparse tabular POMDP. It returns a list of sparse matrices for each action of the problem.

Fully Observable POMDP

POMDPTools.ModelTools.FullyObservablePOMDP — Type

FullyObservablePOMDP(mdp)

Turn MDP mdp into a POMDP where the observations are the states of the MDP.

Generative Belief MDP

Every POMDP is an MDP on the belief space GenerativeBeliefMDP creates a generative model for that MDP.

Warning

The reward generated by the GenerativeBeliefMDP is the reward for a single state sampled from the belief; it is not the expected reward for that belief transition (though, in expectation, they are equivalent of course). Implementing the model with the expected reward requires a custom implementation because belief updaters do not typically deal with reward.

POMDPTools.ModelTools.GenerativeBeliefMDP — Type

GenerativeBeliefMDP(pomdp, updater)

Create a generative model of the belief MDP corresponding to POMDP pomdp with belief updates performed by updater.

Example

using POMDPs
using POMDPModels
using POMDPTools

pomdp = BabyPOMDP()
updater = DiscreteUpdater(pomdp)

belief_mdp = GenerativeBeliefMDP(pomdp, updater)
@show statetype(belief_mdp) # POMDPModels.BoolDistribution

for (a, r, sp) in stepthrough(belief_mdp, RandomPolicy(belief_mdp), "a,r,sp", max_steps=5)
    @show a, r, sp
end

# output
statetype(belief_mdp) = DiscreteBelief{POMDPModels.BabyPOMDP, Bool}Bool}
(a, r, sp) = (true, -5.0, DiscreteBelief{POMDPModels.BabyPOMDP, Bool}(POMDPModels.BabyPOMDP(-5.0, -10.0, 0.1, 0.8, 0.1, 0.9), Bool[0, 1], [1.0, 0.0]))
(a, r, sp) = (true, -5.0, DiscreteBelief{POMDPModels.BabyPOMDP, Bool}(POMDPModels.BabyPOMDP(-5.0, -10.0, 0.1, 0.8, 0.1, 0.9), Bool[0, 1], [1.0, 0.0]))
(a, r, sp) = (true, -5.0, DiscreteBelief{POMDPModels.BabyPOMDP, Bool}(POMDPModels.BabyPOMDP(-5.0, -10.0, 0.1, 0.8, 0.1, 0.9), Bool[0, 1], [1.0, 0.0]))
(a, r, sp) = (false, 0.0, DiscreteBelief{POMDPModels.BabyPOMDP, Bool}(POMDPModels.BabyPOMDP(-5.0, -10.0, 0.1, 0.8, 0.1, 0.9), Bool[0, 1], [0.9759036144578314, 0.02409638554216867]))
(a, r, sp) = (false, 0.0, DiscreteBelief{POMDPModels.BabyPOMDP, Bool}(POMDPModels.BabyPOMDP(-5.0, -10.0, 0.1, 0.8, 0.1, 0.9), Bool[0, 1], [0.9701315984030756, 0.029868401596924433]))

Underlying MDP

POMDPTools.ModelTools.UnderlyingMDP — Type

UnderlyingMDP(m::POMDP)

Transform POMDP m into an MDP where the states are fully observed.

UnderlyingMDP(m::MDP)

Return m

State Action Reward Model

POMDPTools.ModelTools.StateActionReward — Type

StateActionReward(m::Union{MDP,POMDP})

Robustly create a reward function that depends only on the state and action.

If reward(m, s, a) is implemented, that will be used, otherwise the mean of reward(m, s, a, sp) for MDPs or reward(m, s, a, sp, o) for POMDPs will be used.

Example

using POMDPs
using POMDPModels
using POMDPTools

m = BabyPOMDP()

rm = StateActionReward(m)

rm(true, true)

# output

-15.0

Utility Types

Terminal State

TerminalState and its singleton instance terminalstate are available to use for a terminal state in concert with another state type. It has the appropriate type promotion logic to make its use with other types friendly, similar to nothing and missing.

Note

NOTE: This is NOT a replacement for the standard POMDPs.jl isterminal function, though isterminal is implemented for the type. It is merely a convenient type to use for terminal states.

Warning

WARNING: Early tests (August 2018) suggest that the Julia 1.0 compiler will not be able to efficiently implement union splitting in cases as complex as POMDPs, so using a Union for the state type of a problem can currently have a large overhead.

POMDPTools.ModelTools.TerminalState — Type

TerminalState

A type with no fields whose singleton instance terminalstate is used to represent a terminal state with no additional information.

This type has the appropriate promotion logic implemented to function like Missing when added to arrays, etc.

Note that terminal states NEED NOT be of type TerminalState. You can define any state to be terminal by implementing the appropriate isterminal method. Solvers and simulators SHOULD NOT check for this type, but should instead check using isterminal.

POMDPTools.ModelTools.terminalstate — Constant

terminalstate

The singleton instance of type TerminalState representing a terminal state.