Defining a Belief Updater
In this section we list the requirements for defining a belief updater. For a description of what a belief updater is, see Concepts and Architecture - Beliefs and Updaters. Typically a belief updater will have an associated belief type, and may be closely tied to a particular policy/planner.
Defining a Belief Type
A belief object should contain all of the information needed for the next belief update and for the policy to make a decision. The belief type could be a pre-defined type such as a distribution from Distributions.jl
or DiscreteBelief
or SparseCat
from the POMDPTools package, or it could be a custom type.
Often, but not always, the belief will represent a probability distribution. In this case, the functions in the distribution interface should be implemented if possible. Implementing these functions will make the belief usable with many of the policies and planners in the POMDPs.jl ecosystem, and will make it easy for others to convert between beliefs and to interpret what a belief means.
Histories associated with a belief
If a complete or partial record of the action-observation history leading up to a belief is available, it is often helpful to give access to this by implementing the history
or currentobs
functions (see the docstrings for more details). This is especially useful if a problem-writer wants to implement a belief- or observation-dependent action space. Belief type implementers need only implement history
, and currentobs
will automatically be provided, though sometimes it is more convenient to implement currentobs
directly.
Defining an Updater
To create an updater, one should define a subtype of the Updater
abstract type and implement two methods, one to create the initial belief from the problem's initial state distribution and one to perform a belief update:
initialize_belief(updater, d)
creates a belief from state distributiond
appropriate to use with the updater. To extract information fromd
, use the functions from the distribution interface.update(updater, b, a, o)
returns an updated belief given beliefb
, actiona
, and observationo
. One can usually expectb
to be the same type returned byinitialize_belief
because a careful user will always callinitialize_belief
beforeupdate
, but it would also be reasonable to implementupdate
forb
of a different type if it is desirable to handle multiple belief types.
Example: History Updater
One trivial type of belief would be the action-observation history, a list containing the initial state distribution and every action taken and observation received. The history contains all of the information received up to the current time, but it is not usually very useful because most policies make decisions based on a state probability distribution. Here the belief type is simply the built in Vector{Any}
, so we need only create the updater and write update
and initialize_belief
. Normally, update
would contain belief update probability calculations, but in this example, we simply append the action and observation to the history.
(Note that this example is designed for readability rather than efficiency.)
import POMDPs
struct HistoryUpdater <: POMDPs.Updater end
POMDPs.initialize_belief(up::HistoryUpdater, d) = Any[d]
function POMDPs.update(up::HistoryUpdater, b, a, o)
bp = copy(b)
push!(bp, a)
push!(bp, o)
return bp
end
At each step, the history starts with the original distribution, then contains all the actions and observations received up to that point. The example below shows this for the crying baby problem (observations are true/false for crying and actions are true/false for feeding).
using POMDPTools
using POMDPModels
using Random
pomdp = BabyPOMDP()
policy = RandomPolicy(pomdp, rng=MersenneTwister(1))
up = HistoryUpdater()
# within stepthrough initialize_belief is called on the initial state distribution of the pomdp, then update is called at each step.
for b in stepthrough(pomdp, policy, up, "b", rng=MersenneTwister(2), max_steps=5)
@show b
end
# output
b = Any[POMDPModels.BoolDistribution(0.0)]
b = Any[POMDPModels.BoolDistribution(0.0), false, false]
b = Any[POMDPModels.BoolDistribution(0.0), false, false, false, false]
b = Any[POMDPModels.BoolDistribution(0.0), false, false, false, false, true, false]
b = Any[POMDPModels.BoolDistribution(0.0), false, false, false, false, true, false, true, false]