Implemented Simulators
POMDPTools contains a collection of POMDPs.jl simulators.
Usage examples can be found in the Simulations Examples section.
If you are just getting started, probably the easiest way to begin is the stepthrough
function. Otherwise, consult the Which Simulator Should I Use? guide below:
Which Simulator Should I Use?
The simulators in this package provide interaction with simulations of MDP and POMDP environments from a variety of perspectives. Use these questions to choose the best simulator to suit your needs.
I want to run fast rollout simulations and get the discounted reward.
Use the Rollout Simulator.
I want to evaluate performance with many parallel Monte Carlo simulations.
Use the Parallel Simulator.
I want to closely examine the histories of states, actions, etc. produced by simulations.
Use the History Recorder.
I want to step through each individual step of a simulation.
Use the stepthrough
function.
I want to visualize a simulation.
Use the DisplaySimulator
.
Also see the POMDPGifs package for creating gif animations.
I want to interact with a MDP or POMDP environment from the policy's perspective
Use the sim
function.
Stepping through
The stepthrough
function exposes a simulation as an iterator so that the steps can be iterated through with a for loop syntax as follows:
pomdp = BabyPOMDP()
policy = RandomPolicy(pomdp)
for (s, a, o, r) in stepthrough(pomdp, policy, "s,a,o,r", max_steps=10)
println("in state $s")
println("took action $a")
println("received observation $o and reward $r")
end
POMDPTools.Simulators.stepthrough
— Functionstepthrough(problem, policy, [spec])
stepthrough(problem, policy, [spec], [rng=rng], [max_steps=max_steps])
stepthrough(mdp::MDP, policy::Policy, [init_state], [spec]; [kwargs...])
stepthrough(pomdp::POMDP, policy::Policy, [up::Updater, [initial_belief, [initial_state]]], [spec]; [kwargs...])
Create a simulation iterator. This is intended to be used with for loop syntax to output the results of each step as the simulation is being run.
Example:
pomdp = BabyPOMDP()
policy = RandomPolicy(pomdp)
for (s, a, o, r) in stepthrough(pomdp, policy, "s,a,o,r", max_steps=10)
println("in state $s")
println("took action $a")
println("received observation $o and reward $r")
end
The optional spec
argument can be a string, tuple of symbols, or single symbol and follows the same pattern as eachstep
called on a SimHistory
object.
Under the hood, this function creates a StepSimulator
with spec
and returns a [PO]MDPSimIterator
by calling simulate with all of the arguments except spec
. All keyword arguments are passed to the StepSimulator
constructor.
The StepSimulator
contained in this file can provide the same functionality with the following syntax:
sim = StepSimulator("s,a,r,sp")
for (s,a,r,sp) in simulate(sim, problem, policy)
# do something
end
Rollouts
RolloutSimulator
is the simplest MDP or POMDP simulator. When simulate
is called, it simply simulates a single trajectory of the process and returns the discounted reward.
rs = RolloutSimulator()
mdp = GridWorld()
policy = RandomPolicy(mdp)
r = simulate(rs, mdp, policy)
POMDPTools.Simulators.RolloutSimulator
— TypeRolloutSimulator(rng, max_steps)
RolloutSimulator(; <keyword arguments>)
A fast simulator that just returns the reward
The simulation will be terminated when either
- a terminal state is reached (as determined by
isterminal()
or - the discount factor is as small as
eps
or - max_steps have been executed
Keyword arguments:
- rng::AbstractRNG (default: Random.default_rng()) - A random number generator to use.
- eps::Float64 (default: 0.0) - A small number; if γᵗ where γ is the discount factor and t is the time step becomes smaller than this, the simulation will be terminated.
- max_steps::Int (default: typemax(Int)) - The maximum number of steps to simulate.
Usage (optional arguments in brackets):
ro = RolloutSimulator()
history = simulate(ro, pomdp, policy, [updater [, init_belief [, init_state]]])
See also: HistoryRecorder
, run_parallel
History Recorder
A HistoryRecorder
runs a simulation and records the trajectory. It returns an AbstractVector
of NamedTuples
- see Histories for more info.
hr = HistoryRecorder(max_steps=100)
pomdp = TigerPOMDP()
policy = RandomPolicy(pomdp)
h = simulate(hr, pomdp, policy)
POMDPTools.Simulators.HistoryRecorder
— TypeA simulator that records the history for later examination
The simulation will be terminated when either
- a terminal state is reached (as determined by
isterminal()
or - the discount factor is as small as
eps
or - max_steps have been executed
Keyword Arguments: - rng
: The random number generator for the simulation - capture_exception::Bool
: whether to capture an exception and store it in the history, or let it go uncaught, potentially killing the script - show_progress::Bool
: show a progress bar for the simulation - eps
- max_steps
Usage (optional arguments in brackets):
hr = HistoryRecorder()
history = simulate(hr, pomdp, policy, [updater [, init_belief [, init_state]]])
sim()
The sim
function provides a convenient way to interact with a POMDP or MDP environment and return a history. The first argument is a function that is called at every time step and takes a state (in the case of an MDP) or an observation (in the case of a POMDP) as the argument and then returns an action. The second argument is a pomdp or mdp. It is intended to be used with Julia's do
syntax as follows:
pomdp = TigerPOMDP()
history = sim(pomdp, max_steps=10) do obs
println("Observation was $obs.")
return TIGER_OPEN_LEFT
end
This allows a flexible and general way to interact with a POMDP environment without creating new Policy
types.
In the POMDP case, an updater can optionally be supplied as an additional positional argument if the policy function works with beliefs rather than directly with observations.
POMDPTools.Simulators.sim
— Functionsim(polfunc::Function, mdp::MDP; [<keyword arguments>])
sim(polfunc::Function, pomdp::POMDP; [<keyword arguments>])
Alternative way of running a simulation with a function specifying how to calculate the action at each timestep.
Usage
sim(mdp) do s
# code that calculates action `a` based on `s` - this is the policy
# you can also do other things like display something
return a
end
for an MDP or
sim(pomdp) do o
# code that calculates 'a' based on observation `o`
# optionally you could save 'o' in a global variable or do a belief update
return a
end
or with a POMDP
sim(pomdp, updater) do b
# code that calculates 'a' based on belief `b`
# `b` is calculated by `updater`
return a
end
for a POMDP and a belief updater.
Keyword Arguments
All Versions
initialstate
: the initial state for the simulationsimulator
: keyword argument to specify any simulator to run the simulation. If nothing is specified for the simulator, a HistoryRecorder will be used as the simulator, with all keyword arguments forwarded to it, e.g.
will limit the simulation to 100 steps.sim(mdp, max_steps=100, show_progress=true) do s # ... end
POMDP version
initialobs
: this will control the initial observation given to the policy function. If this is not defined,rand(initialobs(m, s))
will be used if it is available. If it is not,missing
will be used.
POMDP and updater version
initialbelief
:initialize_belief(updater, initialbelief)
is the first belief that will be given to the policy function.
Histories
The results produced by HistoryRecorder
s and the sim
function are contained in SimHistory
objects.
POMDPTools.Simulators.SimHistory
— TypeSimHistory
An (PO)MDP simulation history returned by simulate(::HistoryRecorder, ::Union{MDP,POMDP},...)
.
This is an AbstractVector
of NamedTuples
containing the states, actions, etc.
Examples
hist[1][:s] # returns the first state in the history
hist[:a] # returns all of the actions in the history
Examples
using POMDPs, POMDPTools, POMDPModels
hr = HistoryRecorder(max_steps=10)
hist = simulate(hr, BabyPOMDP(), FunctionPolicy(x->true))
step = hist[1] # all information available about the first step
step[:s] # the first state
step[:a] # the first action
To see everything available in a step, use
keys(first(hist))
The entire history of each variable is available by using a Symbol
instead of an index, i.e.
hist[:s]
will return a vector of the starting states for each step (note the difference between :s
and :sp
).
eachstep
The eachstep
function may also be useful:
POMDPTools.Simulators.eachstep
— Functionfor t in eachstep(hist, [spec])
...
end
Iterate through the steps in SimHistory
hist
. spec
is a tuple of symbols or string that controls what is returned for each step.
For example,
for (s, a, r, sp) in eachstep(h, "(s, a, r, sp)")
println("reward $r received when state $sp was reached after action $a was taken in state $s")
end
returns the start state, action, reward and destination state for each step of the simulation.
Alternatively, instead of expanding the steps implicitly, the elements of the step can be accessed as fields (since each step is a NamedTuple
):
for step in eachstep(h, "(s, a, r, sp)")
println("reward $(step.r) received when state $(step.sp) was reached after action $(step.a) was taken in state $(step.s)")
end
The possible valid elements in the iteration specification are
- Any node in the (PO)MDP Dynamic Decision network (by default
:s
,:a
,:sp
,:o
,:r
) b
- the initial belief in the step (for POMDPs only)bp
- the belief after being updated based ono
(for POMDPs only)action_info
- info from the policy decision (fromaction_info
)update_info
- info from the belief update (fromupdate_info
)t
- the timestep index
Examples:
collect(eachstep(h, "a,o"))
will produce a vector of action-observation named tuples.
collect(norm(sp-s) for (s,sp) in eachstep(h, "s,sp"))
will produce a vector of the distances traveled on each step (assuming the state is a Euclidean vector).
Notes
- The iteration specification can be specified as a tuple of symbols (e.g.
(:s, :a)
) instead of a string. - For type stability in performance-critical code, one should construct an iterator directly using
HistoryIterator{typeof(h), (:a,:r)}(h)
rather thaneachstep(h, "ar")
.
Other Functions
state_hist(h)
, action_hist(h)
, observation_hist(h)
belief_hist(h)
, and reward_hist(h)
will return vectors of the states, actions, and rewards, and undiscounted_reward(h)
and discounted_reward(h)
will return the total rewards collected over the trajectory. n_steps(h)
returns the number of steps in the history. exception(h)
and backtrace(h)
can be used to hold an exception if the simulation failed to finish.
view(h, range)
(e.g. view(h, 1:n_steps(h)-4)
) can be used to create a view of the history object h
that only contains a certain range of steps. The object returned by view
is an AbstractSimHistory
that can be iterated through and manipulated just like a complete SimHistory
.
Parallel
POMDPTools contains a utility for running many Monte Carlo simulations in parallel to evaluate performance. The basic workflow involves the following steps:
- Create a vector of
Sim
objects, each specifying how a single simulation should be run. - Use the
run_parallel
orrun
function to run the simulations. - Analyze the results of the simulations contained in the
DataFrame
returned byrun_parallel
.
Example
An example can be found in the Parallel Simulations section.
Sim objects
Each simulation should be specified by a Sim
object which contains all the information needed to run a simulation, including the Simulator
, POMDP
or MDP
, Policy
, Updater
, and any other ingredients.
POMDPTools.Simulators.Sim
— TypeSim(m::MDP, p::Policy[, initialstate]; kwargs...)
Sim(m::POMDP, p::Policy[, updater[, initial_belief[, initialstate]]]; kwargs...)
Create a Sim object that contains everything needed to run and record a single simulation, including model, initial conditions, and metadata.
A vector of Sim
objects can be executed with run
or run_parallel
.
Keyword Arguments
rng::AbstractRNG=Random.default_rng()
max_steps::Int=typemax(Int)
simulator::Simulator=HistoryRecorder(rng=rng, max_steps=max_steps)
metadata::NamedTuple a named tuple (or dictionary) of metadata for the sim that will be recorded, e.g.
(solver_iterations=500,)`.
Running simulations
The simulations are actually carried out by the run
and run_parallel
functions.
POMDPTools.Simulators.run_parallel
— Functionrun_parallel(queue::Vector{Sim})
run_parallel(f::Function, queue::Vector{Sim})
Run Sim
objects in queue
in parallel and return results as a DataFrame
.
By default, the DataFrame
will contain the reward for each simulation and the metadata provided to the sim.
Arguments
queue
: List ofSim
objects to be executedf
: Function to process the results of each simulation
This function should take two arguments, (1) the Sim
that was executed and (2) the result of the simulation, by default a SimHistory
. It should return a named tuple that will appear in the dataframe. See Examples below.
Keyword Arguments
show_progress::Bool
: whether or not to show a progress meterprogress::ProgressMeter.Progress
: determines how the progress meter is displayed
Examples
run_parallel(queue) do sim, hist
return (n_steps=n_steps(hist), reward=discounted_reward(hist))
end
will return a dataframe with with the number of steps and the reward in it.
The run
function is also provided to run simulations in serial (this is often useful for debugging). Note that the documentation below also contains a section for the builtin julia run
function, even though it is not relevant here.
Base.run
— Functionrun(queue::Vector{Sim})
run(f::Function, queue::Vector{Sim})
Run the Sim
objects in queue
on a single process and return the results as a dataframe.
See run_parallel
for more information.
Specifying information to be recorded
By default, only the discounted rewards from each simulation are recorded, but arbitrary information can be recorded.
The run_parallel
and run
functions accept a function (normally specified via the do
syntax) that takes the Sim
object and history of the simulation and extracts relevant statistics as a named tuple. For example, if the desired characteristics are the number of steps in the simulation and the reward, run_parallel
would be invoked as follows:
df = run_parallel(queue) do sim::Sim, hist::SimHistory
return (n_steps=n_steps(hist), reward=discounted_reward(hist))
end
These statistics are combined into a DataFrame
, with each line representing a single simulation, allowing for statistical analysis. For example,
mean(df[:reward]./df[:n_steps])
would compute the average reward per step with each simulation weighted equally regardless of length.
Display
DisplaySimulator
The DisplaySimulator
displays each step of a simulation in real time through a multimedia display such as a Jupyter notebook or ElectronDisplay. Specifically it uses POMDPTools.render
and the built-in Julia display
function to visualize each step.
Example:
using POMDPs
using POMDPModels
using POMDPTools
using ElectronDisplay
ElectronDisplay.CONFIG.single_window = true
ds = DisplaySimulator()
m = SimpleGridWorld()
simulate(ds, m, RandomPolicy(m))
POMDPTools.Simulators.DisplaySimulator
— TypeDisplaySimulator(;kwargs...)
Create a simulator that displays each step of a simulation.
Given a POMDP or MDP model m
, this simulator roughly works like
for step in stepthrough(m, ...)
display(render(m, step))
end
Keyword Arguments
display::AbstractDisplay
: the display to use for the first argument to thedisplay
function. If this isnothing
,display(...)
will be called without anAbstractDisplay
argument.render_kwargs::NamedTuple
: keyword arguments forPOMDPTools.render(...)
max_fps::Number=10
: maximum number of frames to be displayed per second -sleep
will be used to skip extra time, so this is not designed for high precisionpredisplay::Function
: function to call before every call todisplay(...)
. The only argument to this function will be the display (if it is specified) ornothing
extra_initial::Bool=false
: iftrue
, display an extra step at the beginning with only elementst
,sp
, andbp
for POMDPs (this can be useful to see the initial state ifrender
displays onlysp
and nots
).extra_final
::Bool=true: if
true, display an extra step at the end with only elements
t,
done,
s, and
bfor POMDPs (this can be useful to see the final state if
renderdisplays only
sand not
sp`).max_steps::Integer
: maximum number of steps to run forspec::NTuple{Symbol}
: specification of what step elements to display (seeeachstep
)rng::AbstractRNG
: random number generator
See the POMDPSimulators documentation for more tips about using specific displays.
Display-specific tips
The following tips may be helpful when using particular displays.
Jupyter notebooks
By default, in a Jupyter notebook, the visualizations of all steps are displayed in the output box one after another. To make the output animated instead, where the image is overwritten at each step, one may use
DisplaySimulator(predisplay=(d)->IJulia.clear_output(true))
ElectronDisplay
By default, ElectronDisplay will open a new window for each new step. To prevent this, use
ElectronDisplay.CONFIG.single_window = true