API
Exported Types
MOMDPs.MOMDP
— TypeMOMDP{X, Y, A, O} <: POMDP{Tuple{X,Y},A,O}
Abstract base type for a mixed observable Markov decision process.
X: visible state type
Y: hidden state type
A: action type
O: observation type
Notation matching Ong, Sylvie CW, et al. "POMDPs for robotic tasks with mixed observability." Robotics: Science and systems. Vol. 5. No. 4. 2009. link
MOMDPs.POMDP_of_Discrete_MOMDP
— TypePOMDP_of_Discrete_MOMDP{X,Y,A,O} <: POMDP{Tuple{X,Y},A,O}
This type is used to convert a Discrete MOMDP into a POMDP. There are functions defined for this POMDP that use the functions defined for the MOMDP type.
The only difference in the spaces is the observation space. If the original observation space was of size $\mathcal{O}$ and the visible state space was of size $\mathcal{X}$, then the observation space of the POMDP is $\mathcal{X} \times \mathcal{O}$.
MOMDPs.MOMDPAlphaVectorPolicy
— TypeMOMDPAlphaVectorPolicy(momdp::MOMDP, alpha_vecs, action_map, vis_state_map)
Construct a policy from alpha vectors for a Mixed Observability Markov Decision Process (MOMDP).
Arguments
momdp::MOMDP
: The MOMDP problem instance for which the policy is constructed.alpha_vecs
: An abstract vector of alpha vectors, where each alpha vector is a vector of floats representing the value function for a particular belief state.action_map
: A vector of actions corresponding to each alpha vector. Each action is associated with the alpha vector that prescribes it.vis_state_map
: A vector mapping visible states to their corresponding alpha vectors, used to determine which alpha vector applies to a given state.
Fields
momdp::MOMDP
: The MOMDP problem instance, necessary for mapping states to locations in the alpha vectors.n_states_x::Int
: The number of visible states in the MOMDP.n_states_y::Int
: The number of hidden states in the MOMDP.alphas::Vector{Vector{Vector{Float64}}}
: A vector (of size $|\mathcal{X}|$) of vectors (number of alpha vectors) of vectors (number of alpha vectors) of alpha vectors (of size $|\mathcal{Y}|$). This structure holds the alpha vectors for each visible state.action_map::Vector{Vector{A}}
: A vector (of size |X|) of vectors of actions corresponding to the alpha vectors. Each action is associated with a specific alpha vector for a visible state.
This structure represents a policy that uses alpha vectors to determine the best action to take given a belief state in a MOMDP.
MOMDPs.MOMDPDiscreteUpdater
— TypeMOMDPDiscreteUpdater
An updater type for maintaining and updating discrete beliefs over hidden states in a Mixed Observability Markov Decision Process (MOMDP).
Constructor
MOMDPDiscreteUpdater(momdp::MOMDP)
Create a discrete belief updater for the given MOMDP.
Fields
momdp <: MOMDP
: The MOMDP problem instance for which beliefs will be updated
Description
In a MOMDP, the state space is factored into visible states x
(fully observable) and hidden states y
(partially observable). This updater maintains beliefs only over the hidden states, since the visible states are assumed to be directly observed.
The updater implements the discrete Bayesian filter for belief updates, assuming:
- Finite, discrete hidden state spaces
- Known visible state transitions (x → x')
- Probabilistic hidden state transitions that may depend on visible states
- Observations that depend on both visible and hidden states
Usage
momdp = YourMOMDPProblem()
updater = MOMDPDiscreteUpdater(momdp)
# Initialize belief over hidden states
initial_dist = initialstate_y(momdp)
# Update belief after taking action and receiving observation
new_belief = update(updater, current_belief, action, observation, x, xp)
See Also
uniform_belief_y
: Create uniform beliefs over hidden statesinitialize_belief
: Initialize beliefs from distributionsupdate
: Perform belief updates using the MOMDP filter
Exported Functions
MOMDPs.transition_x
— Functiontransition_x(m::MOMDP{X,Y,A,O}, state::Tuple{X,Y}, action)
Return the transition distribution over the next visible state given the current state and action.
T_x(s, a, x′) = p(x′ | s, a) where s = (x,y)
MOMDPs.transition_y
— Functiontransition_y(m::MOMDP{X,Y,A,O}, state::Tuple{X,Y}, action, statep_visible)
Return the transition distribution over the next hidden state given the current state, action, and next visible state.
T_y(s, a, x′, y′) = p(y′ | s, a, x′) where s = (x,y)
MOMDPs.states_x
— Functionstates_x(problem::MOMDP)
Returns the complete visible state space of a MOMDP.
MOMDPs.states_y
— Functionstates_y(problem::MOMDP)
Returns the complete hidden state space of a MOMDP.
MOMDPs.stateindex_x
— Functionstateindex_x(problem::MOMDP, s)
Return the integer index of the visible state x
where s
is a tuple of the form (x,y)
. Used for discrete models only.
MOMDPs.stateindex_y
— Functionstateindex_y(problem::MOMDP, s)
Return the integer index of the hidden state y
where s
is a tuple of the form (x,y)
. Used for discrete models only.
MOMDPs.initialstate_x
— Functioninitialstate_x(problem::MOMDP)
Return the initial visible state distribution.
MOMDPs.initialstate_y
— Functioninitialstate_y(problem::MOMDP, x)
Return the initial hidden state distribution conditioned on the visible state x
.
MOMDPs.statetype_x
— Functionstatetype_x(t::Type)
statetype_x(p::MOMDP)
Return the visible state type for a MOMDP (the X
in MOMDP{X,Y,A,O}
).
MOMDPs.statetype_y
— Functionstatetype_y(t::Type)
statetype_y(p::MOMDP)
Return the hidden state type for a MOMDP (the Y
in MOMDP{X,Y,A,O}
).
MOMDPs.ordered_states_x
— Functionordered_states_x(momdp)
Return an AbstractVector
of the visible states in a MOMDP
ordered according to stateindex_x(momdp, s)
.
ordered_states_x(momdp)
will always return a AbstractVector{X}
v
containing all of the visible states in states_x(momdp)
in the order such that stateindex_x(momdp, v[i]) == i
. You may wish to override this for your problem for efficiency.
MOMDPs.ordered_states_y
— Functionordered_states_y(momdp)
Return an AbstractVector
of the hidden states in a MOMDP
ordered according to stateindex_y(momdp, s)
.
ordered_states_y(momdp)
will always return a AbstractVector{Y}
v
containing all of the hidden states in states_y(momdp)
in the order such that stateindex_y(momdp, v[i]) == i
. You may wish to override this for your problem for efficiency.
MOMDPs.is_y_prime_dependent_on_x_prime
— Functionis_y_prime_dependent_on_x_prime(m::MOMDP)
Defines if the next hidden state y′
depends on the next visible state x′
given the current visible state x
, hidden state y
, and action a
.
Return false
if the conditional probability distribution satisfies: p(y′ | x, y, a, x′) = p(y′ | x, y, a)
.
MOMDPs.is_x_prime_dependent_on_y
— Functionis_x_prime_dependent_on_y(m::MOMDP)
Defines if the next visible state x′
depends on the current hidden state y
given the current visible state x
and action a
.
Returns false
if the conditional probability distribution satisfies: p(x′ | x, y, a) = p(x′ | x, a)
.
MOMDPs.is_initial_distribution_independent
— Functionis_initial_distribution_independent(m::MOMDP)
Defines whether the initial distributions of the visible state x
and hidden state y
are independent.
Returns true
if the joint probability distribution satisfies: p(x, y) = p(x)p(y)
, meaning x
and y
are independent in the initial distribution.
MOMDPs.beliefvec_y
— Functionbeliefvec_y(m::MOMDP, n_states::Int, b)
Convert a belief distribution b
over hidden states to a vector representation suitable for dot product operations with alpha vectors in a MOMDP.
Arguments
m::MOMDP
: The MOMDP problem instancen_states_y::Int
: The number of hidden states in the MOMDPb
: The belief distribution over hidden states (supports various belief types)
Returns
- A vector of length
n_states_y
where elementi
represents the probability of hidden statei
Supported Belief Types
SparseCat
: Converts sparse categorical distribution to dense vectorAbstractVector
: Returns the vector directly (with length assertion)DiscreteBelief
: Extracts the underlying probability vectorDeterministic
: Creates a one-hot vector for the deterministic state
This function is used internally by alpha vector policies to convert belief distributions into the vector format required for computing dot products with alpha vectors.
MOMDPs.uniform_belief_y
— Function uniform_belief_y(momdp)
uniform_belief_y(up::MOMDPDiscreteUpdater)
Return a uniform DiscreteBelief over all hidden states in the MOMDP.
Arguments
momdp
: A MOMDP problem instance, orup::MOMDPDiscreteUpdater
: A MOMDP discrete belief updater
Returns
- A
DiscreteBelief
with equal probability1/|Y|
for each hidden statey ∈ Y
Description
This function creates a uniform prior belief over the hidden state space, which is often used as an initial belief when the true hidden state is unknown. In MOMDP problems, this represents maximum uncertainty about the hidden state while the visible state is assumed to be known.
The uniform belief is particularly useful for:
- Initializing belief when no prior information is available
- Baseline comparisons in experiments
- Worst-case analysis of belief-dependent policies
Example
momdp = YourMOMDPProblem()
initial_belief = uniform_belief_y(momdp)
# or
updater = MOMDPDiscreteUpdater(momdp)
initial_belief = uniform_belief_y(updater)
Extended Functions
POMDPs.jl and POMDPTools.jl
POMDPs.states
— MethodPOMDPs.states(p::MOMDP)
Helper function to return the full state space for discrete MOMDPs. The states are Tuple{X,Y} where X is the visible state and Y is the hidden state.
POMDPs.stateindex
— MethodPOMDPs.stateindex(p::MOMDP{X,Y,A,O}, s::Tuple{X,Y}) where {X,Y,A,O}
Helper function to return the index of the Tuple{X,Y} state for discrete MOMDPs.
POMDPs.transition
— MethodPOMDPs.transition(p::MOMDP{X,Y,A,O}, s::Tuple{X,Y}, a::A) where {X,Y,A,O}
Helper function to return the full transition distribution for discrete MOMDPs. The states are Tuple{X,Y}. It uses transition_x
and transition_y
to construct the distribution.
POMDPs.initialstate
— Methodinitialstate(p::MOMDP{X,Y,A,O}) where {X,Y,A,O}
Helper function to return the initial state distribution for discrete MOMDPs. The states are Tuple{X,Y}. It uses initialstate_x
and initialstate_y
to construct the distribution.
POMDPs.observations
— MethodPOMDPs.observations(p::POMDP_of_Discrete_MOMDP)
Returns the full observation space of a POMDPofDiscrete_MOMDP. The observations are Tuple{X,O}
where X
is the visible state and O
is the observation.
POMDPs.observation
— MethodPOMDPs.observation(p::POMDP_of_Discrete_MOMDP{X, Y, A, O}, a::A, s::Tuple{X, Y}) where {X, Y, A, O}
Returns the full observation distribution for a POMDPofDiscrete_MOMDP. The observations are Tuple{X,O}
where X
is the visible state and O
is the observation.
POMDPs.obsindex
— MethodPOMDPs.obsindex(p::POMDP_of_Discrete_MOMDP, o)
Returns the index of the Tuple{X,O}
observation for a POMDPofDiscrete_MOMDP.
POMDPs.value
— Methodvalue(p::MOMDPAlphaVectorPolicy, b)
Calculate the value of belief state b
for a MOMDP using alpha vectors.
This function computes the value by:
- Marginalizing the belief over the partially observable state to get $b(x)$ for each fully observable state $x$
- Computing the conditional belief $b(y \mid x) = b(x,y)/b(x)$ for each $x$
- Finding the maximum dot product between $b(y \mid x)$ and the alpha vectors for each $x$
- Summing $b(x) \cdot V(x, b(y \mid x))$ over all $x$ to get the total value
Arguments
p::MOMDPAlphaVectorPolicy
: The alpha vector policyb
: The belief state over the joint state space (x,y)
Returns
- The value of the belief state
Notes
This is not the most efficient way to get a value if we are operating in a true MOMDP framework. However, this keeps the structure of the code similar to the POMDPs.jl framework.
POMDPs.value
— Methodvalue(p::MOMDPAlphaVectorPolicy, b, x)
Calculate the value for a specific visible state x
given a belief b
over the joint state space.
Arguments
p::MOMDPAlphaVectorPolicy
: The alpha vector policyb
: The belief distribution over the joint state space (x,y)x
: The specific visible state to evaluate
Returns
- The value for the given visible state and belief
Description
This function computes the value by:
- Extracting the marginal belief over hidden states
y
for the given visible statex
by evaluatingpdf(b, (x,y))
for ally
- Finding the maximum dot product between this marginal belief and all alpha vectors associated with visible state
x
This is more efficient than the general value(p, b)
function when the visible state is known, as it only needs to consider alpha vectors for the specific x
rather than marginalizing over all visible states.
POMDPs.action
— Methodaction(p::MOMDPAlphaVectorPolicy, b)
Return the action prescribed by the MOMDP alpha-vector policy p
for the belief b
over the joint state space (x, y).
Heuristic
- Find the visible state
x
with the largest probability mass inb
. - Form the conditional distribution over
y
given thatx
. - Among the alpha-vectors in
p.alphas[x]
, pick the one with the largest dot product with that conditional distribution. - Return the action associated with that alpha-vector.
Notes
When $b$ is not a pure distribution over a single $x$: Typically in a MOMDP, we assume $x$ (the "visible" state) is fully observed at runtime, so the belief over $(x, y)$ will place essentially all its probability mass on a single $x$. In that case, the above steps are effectively picking the single $x$ that we actually observed.
If you are operating in a true MOMDP framework, you can implement a custom action function that takes in the visible state $x$ and the conditional distribution over $y$ given $x$. While this would result in the same action as the heuristic above, it will be more efficient. E.g.:
# If ``x`` is known exactly at runtime and we have a distribution only over ``y``: function action(p::MOMDPAlphaVectorPolicy, x, by) x_idx = stateindex_x(p.momdp, x) # pick the alpha-vector among p.alphas[x_idx] that maximizes dot(alpha, by) ... end
In case your solver or simulator still gives a multi-modal distribution over different $x$ (which can happen in generic POMDP frameworks), the code here picks the $x$ that has the largest total probability mass in the belief. While this heuristic might be sufficient for some problems, we recommend implementing a custom action function that performs a one-step lookahead using the
value
function.
POMDPs.action
— Methodaction(p::MOMDPAlphaVectorPolicy, b, x)
Return the action prescribed by the MOMDP alpha-vector policy p
for the belief b
over the hidden states, given the known visible state x
.
Arguments
p::MOMDPAlphaVectorPolicy
: The alpha vector policyb
: The belief distribution over hidden statesy
x
: The known visible state
Returns
- The action for the given visible state and belief over hidden states
Description
This function assumes the visible state x
is fully observed and known. It:
- Converts the belief
b
to a vector representation over hidden states - Among all alpha vectors associated with visible state
x
, finds the one with the maximum dot product with the belief vector - Returns the action associated with that optimal alpha vector
This is more efficient than the version without explicit x
when the visible state is known at runtime, as it avoids the need to infer x
from the belief distribution.
POMDPTools.Policies.actionvalues
— Methodactionvalues(p::MOMDPAlphaVectorPolicy, b, x)
Compute the action values (Q-values) for all actions given a belief b
over hidden states and a known visible state x
.
Arguments
p::MOMDPAlphaVectorPolicy
: The alpha vector policyb
: The belief distribution over hidden statesy
x
: The known visible state
Returns
- A vector of action values where the ith element is the Q-value for action
i
Description
This function performs a one-step lookahead to compute action values by:
- For each action
a
and each hidden statey
in the belief support:- Computing the immediate reward
R(x,y,a)
- For each possible next visible state
x'
and hidden statey'
:- For each possible observation
o
:- Updating the belief to get
b'
- Computing the value using the alpha vectors for
x'
- Accumulating the discounted expected future value
- Updating the belief to get
- For each possible observation
- Computing the immediate reward
- Summing over all transitions weighted by their probabilities
The resulting Q-values can be used for action selection or policy evaluation when the visible state is known.
POMDPs.initialize_belief
— Methodinitialize_belief(bu::MOMDPDiscreteUpdater, dist::Any)
Initialize a discrete belief over hidden states from a given distribution.
Arguments
bu::MOMDPDiscreteUpdater
: The MOMDP discrete belief updaterdist
: A distribution over hidden states to initialize from (supports various distribution types)
Returns
- A
DiscreteBelief
over the hidden state space with probabilities initialized fromdist
Description
This function creates a discrete belief representation over the hidden states y
suitable for use with MOMDP belief update operations. The conversion process:
- Creates a zero-initialized probability vector over all hidden states in the MOMDP
- For each state
y
in the support of the input distribution, extracts the probabilitypdf(dist, y)
and assigns it to the corresponding index in the belief vector - Returns a
DiscreteBelief
object that can be used with the MOMDP updater
Supported Distribution Types
The function can handle various distribution types through the generic pdf
interface:
- Discrete distributions (e.g.,
Categorical
,DiscreteUniform
) - Custom distributions that implement
pdf
andsupport
- Sparse distributions with limited support
Usage Examples
updater = MOMDPDiscreteUpdater(momdp)
# From a uniform distribution
uniform_dist = DiscreteUniform(1, length(states_y(momdp)))
belief = initialize_belief(updater, uniform_dist)
# From a sparse categorical distribution
sparse_dist = SparseCat([state1, state3], [0.7, 0.3])
belief = initialize_belief(updater, sparse_dist)
Implementation Notes
- Uses
stateindex_y
to map hidden states to belief vector indices - Assumes the distribution is over individual hidden states, not joint (x,y) states
- The resulting belief is properly normalized if the input distribution is normalized
POMDPs.update
— Methodupdate(bu::MOMDPDiscreteUpdater, b::DiscreteBelief, a, o, x, xp)
Update a discrete belief over hidden states using the MOMDP belief update equation.
Arguments
bu::MOMDPDiscreteUpdater
: The MOMDP discrete belief updaterb::DiscreteBelief
: The current belief over hidden statesy
a
: The action takeno
: The observation received after taking actiona
x
: The previous visible statexp
: The current visible state
Returns
- A new
DiscreteBelief
representing the updated belief over hidden states
Description
This function implements the discrete Bayesian filter for MOMDPs, which updates beliefs over hidden states given knowledge of visible state transitions.
The function iterates through all current hidden states with non-zero probability, computes transition probabilities to next hidden states, weights by observation probabilities, and normalizes the result.
Errors
Throws an error if the updated belief probabilities sum to zero, which indicates an impossible observation given the current belief and action.
POMDPs.update
— Methodupdate(bu::MOMDPDiscreteUpdater, b::Any, a, o, x, xp)
This is a convenience method that handles arbitrary belief types by first calling initialize_belief
to convert them to a DiscreteBelief
, then performing the standard discrete belief update.
Internal Functions
MOMDPs.alphapairs
— FunctionReturn an iterator of alpha vector-action pairs in the policy, given a visible state.
MOMDPs.alphavectors
— FunctionReturn the alpha vectors, given a visible state.