Samplers
Defining a Sampler
In this section, we outline the requirements and guidelines for defining a belief Sampler
.
Interface
The Sampler
interface only has one method: the functor. For example, if you wanted to implement your own Sampler
, you could write something like this
struct MySampler <: Compressor
foo
bar
end
# functor definition
function (c::MySampler)(pomdp::POMDP)
# YOUR CODE HERE
return sampled_beliefs
end
Implemented Sampler
CompressedBeliefMDPs provides the following generic belief samplers:
- an exploratory belief expansion sampler
- a Policy rollout sampler
- an ExplorationPolicy rollout sampler
Exploratory Belief Expansion
CompressedBeliefMDPs.BeliefExpansionSampler
— TypeBeliefExpansionSampler
Fast extension of exploratory belief expansion (Algorithm 21.13 in Algorithms for Decision Making) that uses $k$-d trees.
Fields
updater::Updater
: The updater used to update beliefs.metric::NearestNeighbors.MinkowskiMetric
: The metric used to measure distances between beliefs.
It must be a Minkowski metric.
n::Integer
: The number of belief expansions to perform.
Constructors
BeliefExpansionSampler(pomdp::POMDP; updater::Updater=DiscreteUpdater(pomdp),
metric::NearestNeighbors.MinkowskiMetric=Euclidean(), n::Integer=3)
Methods
(s::BeliefExpansionSampler)(pomdp::POMDP)
Creates an initial belief and performs exploratory belief expansion. Returns the unique belief states. Only works for POMDPs with discrete state, action, and observation spaces.
Example Usage
julia> pomdp = TigerPOMDP();
julia> sampler = BeliefExpansionSampler(pomdp; n=2);
julia> beliefs = sampler(pomdp)
Set{DiscreteBelief{TigerPOMDP, Bool}} with 4 elements:
DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.15000000000000002, 0.85])
DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.5, 0.5])
DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.85, 0.15000000000000002])
DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.9697986577181208, 0.030201342281879207])
Policy Sampler
CompressedBeliefMDPs.PolicySampler
— TypePolicySampler
Samples belief states by rolling out a Policy
.
Fields
policy::Policy
: The policy used for decision making.updater::Updater
: The updater used for updating beliefs.n::Integer
: The maximum number of simulated steps.rng::AbstractRNG
: The random number generator used for sampling.verbose::Bool
: Whether to use a progress bar while sampling.
Constructors
PolicySampler(pomdp::POMDP; policy::Policy=RandomPolicy(pomdp),
updater::Updater=DiscreteUpdater(pomdp), n::Integer=10,
rng::AbstractRNG=Random.GLOBAL_RNG)
Methods
(s::PolicySampler)(pomdp::POMDP)
Returns a vector of unique belief states.
Example
julia> pomdp = TigerPOMDP();
julia> sampler = PolicySampler(pomdp; n=3);
julia> 2-element Vector{Any}:
DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.5, 0.5])
DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.15000000000000002, 0.85])
ExplorationPolicy Sampler
CompressedBeliefMDPs.ExplorationPolicySampler
— TypeExplorationPolicySampler
Samples belief states by rolling out an ExplorationPolicy
. Essentially identical to PolicySampler
.
Fields
explorer::ExplorationPolicy
: TheExplorationPolicy
used for decision making.on_policy::Policy
: The fallbackPolicy
used for decision making when not exploring.updater::Updater
: The updater used for updating beliefs.n::Integer
: The maximum number of simulated steps.rng::AbstractRNG
: The random number generator used for sampling.verbose::Bool
: Whether to use a progress bar while sampling.
Constructors
ExplorationPolicySampler(pomdp::POMDP; rng::AbstractRNG=Random.GLOBAL_RNG,
explorer::ExplorationPolicy=EpsGreedyPolicy(pomdp, 0.1; rng=rng), on_policy=RandomPolicy(pomdp),
updater::Updater=DiscreteUpdater(pomdp), n::Integer=10)
Methods
(s::ExplorationPolicySampler)(pomdp::POMDP)
Returns a vector of unique belief states.
Example Usage
julia> pomdp = TigerPOMDP()
julia> sampler = ExplorationPolicySampler(pomdp; n=30)
julia> sampler(pomdp)
3-element Vector{Any}:
DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.5, 0.5])
DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.85, 0.15000000000000002])
DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.9697986577181208, 0.030201342281879207])