sim()
The sim
function provides a convenient way to interact with a POMDP or MDP environment and return a history. The first argument is a function that is called at every time step and takes a state (in the case of an MDP) or an observation (in the case of a POMDP) as the argument and then returns an action. The second argument is a pomdp or mdp. It is intended to be used with Julia's do
syntax as follows:
pomdp = TigerPOMDP()
history = sim(pomdp, max_steps=10) do obs
println("Observation was $obs.")
return TIGER_OPEN_LEFT
end
This allows a flexible and general way to interact with a POMDP environment without creating new Policy
types.
In the POMDP case, an updater can optionally be supplied as an additional positional argument if the policy function works with beliefs rather than directly with observations.
More examples can be found in the POMDPExamples Package
More examples can be found in the POMDPExamples Package
POMDPSimulators.sim
— Functionsim(polfunc::Function, mdp::MDP; [<keyword arguments>])
sim(polfunc::Function, pomdp::POMDP; [<keyword arguments>])
Alternative way of running a simulation with a function specifying how to calculate the action at each timestep.
Usage
sim(mdp) do s
# code that calculates action `a` based on `s` - this is the policy
# you can also do other things like display something
return a
end
for an MDP or
sim(pomdp) do o
# code that calculates 'a' based on observation `o`
# optionally you could save 'o' in a global variable or do a belief update
return a
end
or with a POMDP
sim(pomdp, updater) do b
# code that calculates 'a' based on belief `b`
# `b` is calculated by `updater`
return a
end
for a POMDP and a belief updater.
Keyword Arguments
All Versions
initialstate
: the initial state for the simulationsimulator
: keyword argument to specify any simulator to run the simulation. If nothing is specified for the simulator, a HistoryRecorder will be used as the simulator, with all keyword arguments forwarded to it, e.g.
will limit the simulation to 100 steps.sim(mdp, max_steps=100, show_progress=true) do s # ... end
POMDP version
initialobs
: this will control the initial observation given to the policy function. If this is not defined,rand(initialobs(m, s))
will be used if it is available. If it is not,missing
will be used.
POMDP and updater version
initialbelief
:initialize_belief(updater, initialbelief)
is the first belief that will be given to the policy function.