Implemented Simulators
POMDPTools contains a collection of POMDPs.jl simulators.
Usage examples can be found in the Simulations Examples section.
If you are just getting started, probably the easiest way to begin is the stepthrough function. Otherwise, consult the Which Simulator Should I Use? guide below:
Which Simulator Should I Use?
The simulators in this package provide interaction with simulations of MDP and POMDP environments from a variety of perspectives. Use these questions to choose the best simulator to suit your needs.
I want to run fast rollout simulations and get the discounted reward.
Use the Rollout Simulator.
I want to evaluate performance with many parallel Monte Carlo simulations.
Use the Parallel Simulator.
I want to closely examine the histories of states, actions, etc. produced by simulations.
Use the History Recorder.
I want to step through each individual step of a simulation.
Use the stepthrough function.
I want to visualize a simulation.
Use the DisplaySimulator.
Also see the POMDPGifs package for creating gif animations.
I want to interact with a MDP or POMDP environment from the policy's perspective
Use the sim function.
Stepping through
The stepthrough function exposes a simulation as an iterator so that the steps can be iterated through with a for loop syntax as follows:
pomdp = BabyPOMDP()
policy = RandomPolicy(pomdp)
for (s, a, o, r) in stepthrough(pomdp, policy, "s,a,o,r", max_steps=10)
    println("in state $s")
    println("took action $a")
    println("received observation $o and reward $r")
endPOMDPTools.Simulators.stepthrough — Functionstepthrough(problem, policy, [spec])
stepthrough(problem, policy, [spec], [rng=rng], [max_steps=max_steps])
stepthrough(mdp::MDP, policy::Policy, [init_state], [spec]; [kwargs...])
stepthrough(pomdp::POMDP, policy::Policy, [up::Updater, [initial_belief, [initial_state]]], [spec]; [kwargs...])Create a simulation iterator. This is intended to be used with for loop syntax to output the results of each step as the simulation is being run.
Example:
pomdp = BabyPOMDP()
policy = RandomPolicy(pomdp)
for (s, a, o, r) in stepthrough(pomdp, policy, "s,a,o,r", max_steps=10)
    println("in state $s")
    println("took action $a")
    println("received observation $o and reward $r")
endThe optional spec argument can be a string, tuple of symbols, or single symbol and follows the same pattern as eachstep called on a SimHistory object.
Under the hood, this function creates a StepSimulator with spec and returns a [PO]MDPSimIterator by calling simulate with all of the arguments except spec. All keyword arguments are passed to the StepSimulator constructor.
The StepSimulator contained in this file can provide the same functionality with the following syntax:
sim = StepSimulator("s,a,r,sp")
for (s,a,r,sp) in simulate(sim, problem, policy)
    # do something
endRollouts
RolloutSimulator is the simplest MDP or POMDP simulator. When simulate is called, it simply simulates a single trajectory of the process and returns the discounted reward.
rs = RolloutSimulator()
mdp = GridWorld()
policy = RandomPolicy(mdp)
r = simulate(rs, mdp, policy)POMDPTools.Simulators.RolloutSimulator — TypeRolloutSimulator(rng, max_steps)
RolloutSimulator(; <keyword arguments>)A fast simulator that just returns the reward
The simulation will be terminated when either
- a terminal state is reached (as determined by isterminal()or
- the discount factor is as small as epsor
- max_steps have been executed
Keyword arguments:
- rng::AbstractRNG (default: Random.default_rng()) - A random number generator to use.
- eps::Float64 (default: 0.0) - A small number; if γᵗ where γ is the discount factor and t is the time step becomes smaller than this, the simulation will be terminated.
- max_steps::Int (default: typemax(Int)) - The maximum number of steps to simulate.
Usage (optional arguments in brackets):
ro = RolloutSimulator()
history = simulate(ro, pomdp, policy, [updater [, init_belief [, init_state]]])See also: HistoryRecorder, run_parallel
History Recorder
A HistoryRecorder runs a simulation and records the trajectory. It returns an AbstractVector of NamedTuples - see Histories for more info.
hr = HistoryRecorder(max_steps=100)
pomdp = TigerPOMDP()
policy = RandomPolicy(pomdp)
h = simulate(hr, pomdp, policy)POMDPTools.Simulators.HistoryRecorder — TypeA simulator that records the history for later examination
The simulation will be terminated when either
- a terminal state is reached (as determined by isterminal()or
- the discount factor is as small as epsor
- max_steps have been executed
Keyword Arguments:     - rng: The random number generator for the simulation     - capture_exception::Bool: whether to capture an exception and store it in the history, or let it go uncaught, potentially killing the script     - show_progress::Bool: show a progress bar for the simulation     - eps     - max_steps
Usage (optional arguments in brackets):
hr = HistoryRecorder()
history = simulate(hr, pomdp, policy, [updater [, init_belief [, init_state]]])sim()
The sim function provides a convenient way to interact with a POMDP or MDP environment and return a history. The first argument is a function that is called at every time step and takes a state (in the case of an MDP) or an observation (in the case of a POMDP) as the argument and then returns an action. The second argument is a pomdp or mdp. It is intended to be used with Julia's do syntax as follows:
pomdp = TigerPOMDP()
history = sim(pomdp, max_steps=10) do obs
    println("Observation was $obs.")
    return TIGER_OPEN_LEFT
endThis allows a flexible and general way to interact with a POMDP environment without creating new Policy types.
In the POMDP case, an updater can optionally be supplied as an additional positional argument if the policy function works with beliefs rather than directly with observations.
POMDPTools.Simulators.sim — Functionsim(polfunc::Function, mdp::MDP; [<keyword arguments>])
sim(polfunc::Function, pomdp::POMDP; [<keyword arguments>])Alternative way of running a simulation with a function specifying how to calculate the action at each timestep.
Usage
sim(mdp) do s
    # code that calculates action `a` based on `s` - this is the policy
    # you can also do other things like display something
    return a
endfor an MDP or
sim(pomdp) do o
    # code that calculates 'a' based on observation `o`
    # optionally you could save 'o' in a global variable or do a belief update
    return a
endor with a POMDP
sim(pomdp, updater) do b
    # code that calculates 'a' based on belief `b`
    # `b` is calculated by `updater`
    return a
endfor a POMDP and a belief updater.
Keyword Arguments
All Versions
- initialstate: the initial state for the simulation
- simulator: keyword argument to specify any simulator to run the simulation. If nothing is specified for the simulator, a HistoryRecorder will be used as the simulator, with all keyword arguments forwarded to it, e.g.
 will limit the simulation to 100 steps.- sim(mdp, max_steps=100, show_progress=true) do s # ... end
POMDP version
- initialobs: this will control the initial observation given to the policy function. If this is not defined,- rand(initialobs(m, s))will be used if it is available. If it is not,- missingwill be used.
POMDP and updater version
- initialbelief:- initialize_belief(updater, initialbelief)is the first belief that will be given to the policy function.
Histories
The results produced by HistoryRecorders and the sim function are contained in SimHistory objects.
POMDPTools.Simulators.SimHistory — TypeSimHistoryAn (PO)MDP simulation history returned by simulate(::HistoryRecorder, ::Union{MDP,POMDP},...).
This is an AbstractVector of NamedTuples containing the states, actions, etc.
Examples
hist[1][:s] # returns the first state in the historyhist[:a] # returns all of the actions in the historyExamples
using POMDPs, POMDPTools, POMDPModels
hr = HistoryRecorder(max_steps=10)
hist = simulate(hr, BabyPOMDP(), FunctionPolicy(x->true))
step = hist[1] # all information available about the first step
step[:s] # the first state
step[:a] # the first actionTo see everything available in a step, use
keys(first(hist))The entire history of each variable is available by using a Symbol instead of an index, i.e.
hist[:s]will return a vector of the starting states for each step (note the difference between :s and :sp).
eachstep
The eachstep function may also be useful:
POMDPTools.Simulators.eachstep — Functionfor t in eachstep(hist, [spec])
    ...
endIterate through the steps in SimHistory hist. spec is a tuple of symbols or string that controls what is returned for each step.
For example,
for (s, a, r, sp) in eachstep(h, "(s, a, r, sp)")    
    println("reward $r received when state $sp was reached after action $a was taken in state $s")
endreturns the start state, action, reward and destination state for each step of the simulation.
Alternatively, instead of expanding the steps implicitly, the elements of the step can be accessed as fields (since each step is a NamedTuple):
for step in eachstep(h, "(s, a, r, sp)")    
    println("reward $(step.r) received when state $(step.sp) was reached after action $(step.a) was taken in state $(step.s)")
endThe possible valid elements in the iteration specification are
- Any node in the (PO)MDP Dynamic Decision network (by default :s,:a,:sp,:o,:r)
- b- the initial belief in the step (for POMDPs only)
- bp- the belief after being updated based on- o(for POMDPs only)
- action_info- info from the policy decision (from- action_info)
- update_info- info from the belief update (from- update_info)
- t- the timestep index
Examples:
collect(eachstep(h, "a,o"))will produce a vector of action-observation named tuples.
collect(norm(sp-s) for (s,sp) in eachstep(h, "s,sp"))will produce a vector of the distances traveled on each step (assuming the state is a Euclidean vector).
Notes
- The iteration specification can be specified as a tuple of symbols (e.g. (:s, :a)) instead of a string.
- For type stability in performance-critical code, one should construct an iterator directly using HistoryIterator{typeof(h), (:a,:r)}(h)rather thaneachstep(h, "ar").
Other Functions
state_hist(h), action_hist(h), observation_hist(h) belief_hist(h), and reward_hist(h) will return vectors of the states, actions, and rewards, and undiscounted_reward(h) and discounted_reward(h) will return the total rewards collected over the trajectory. n_steps(h) returns the number of steps in the history. exception(h) and backtrace(h) can be used to hold an exception if the simulation failed to finish.
view(h, range) (e.g. view(h, 1:n_steps(h)-4)) can be used to create a view of the history object h that only contains a certain range of steps. The object returned by view is an AbstractSimHistory that can be iterated through and manipulated just like a complete SimHistory.
Parallel
POMDPTools contains a utility for running many Monte Carlo simulations in parallel to evaluate performance. The basic workflow involves the following steps:
- Create a vector of Simobjects, each specifying how a single simulation should be run.
- Use the run_parallelorrunfunction to run the simulations.
- Analyze the results of the simulations contained in the DataFramereturned byrun_parallel.
Example
An example can be found in the Parallel Simulations section.
Sim objects
Each simulation should be specified by a Sim object which contains all the information needed to run a simulation, including the Simulator, POMDP or MDP, Policy, Updater, and any other ingredients.
POMDPTools.Simulators.Sim — TypeSim(m::MDP, p::Policy[, initialstate]; kwargs...)
Sim(m::POMDP, p::Policy[, updater[, initial_belief[, initialstate]]]; kwargs...)Create a Sim object that contains everything needed to run and record a single simulation, including model, initial conditions, and metadata.
A vector of Sim objects can be executed with run or run_parallel.
Keyword Arguments
- rng::AbstractRNG=Random.default_rng()
- max_steps::Int=typemax(Int)
- simulator::Simulator=HistoryRecorder(rng=rng, max_steps=max_steps)
- metadata::NamedTuple a named tuple (or dictionary) of metadata for the sim that will be recorded, e.g.(solver_iterations=500,)`.
Running simulations
The simulations are actually carried out by the run and run_parallel functions.
POMDPTools.Simulators.run_parallel — Functionrun_parallel(queue::Vector{Sim})
run_parallel(f::Function, queue::Vector{Sim})Run Sim objects in queue in parallel and return results as a DataFrame.
By default, the DataFrame will contain the reward for each simulation and the metadata provided to the sim.
Arguments
- queue: List of- Simobjects to be executed
- f: Function to process the results of each simulation
This function should take two arguments, (1) the Sim that was executed and (2) the result of the simulation, by default a SimHistory. It should return a named tuple that will appear in the dataframe. See Examples below.
Keyword Arguments
- show_progress::Bool: whether or not to show a progress meter
- progress::ProgressMeter.Progress: determines how the progress meter is displayed
Examples
run_parallel(queue) do sim, hist
    return (n_steps=n_steps(hist), reward=discounted_reward(hist))
endwill return a dataframe with with the number of steps and the reward in it.
The run function is also provided to run simulations in serial (this is often useful for debugging). Note that the documentation below also contains a section for the builtin julia run function, even though it is not relevant here.
Base.run — Functionrun(queue::Vector{Sim})
run(f::Function, queue::Vector{Sim})Run the Sim objects in queue on a single process and return the results as a dataframe.
See run_parallel for more information.
Specifying information to be recorded
By default, only the discounted rewards from each simulation are recorded, but arbitrary information can be recorded.
The run_parallel and run functions accept a function (normally specified via the do syntax) that takes the Sim object and history of the simulation and extracts relevant statistics as a named tuple. For example, if the desired characteristics are the number of steps in the simulation and the reward, run_parallel would be invoked as follows:
df = run_parallel(queue) do sim::Sim, hist::SimHistory
    return (n_steps=n_steps(hist), reward=discounted_reward(hist))
endThese statistics are combined into a DataFrame, with each line representing a single simulation, allowing for statistical analysis. For example,
mean(df[:reward]./df[:n_steps])would compute the average reward per step with each simulation weighted equally regardless of length.
Display
DisplaySimulator
The DisplaySimulator displays each step of a simulation in real time through a multimedia display such as a Jupyter notebook or ElectronDisplay. Specifically it uses POMDPTools.render and the built-in Julia display function to visualize each step.
Example:
using POMDPs
using POMDPModels
using POMDPTools
using ElectronDisplay
ElectronDisplay.CONFIG.single_window = true
ds = DisplaySimulator()
m = SimpleGridWorld()
simulate(ds, m, RandomPolicy(m))POMDPTools.Simulators.DisplaySimulator — TypeDisplaySimulator(;kwargs...)Create a simulator that displays each step of a simulation.
Given a POMDP or MDP model m, this simulator roughly works like
for step in stepthrough(m, ...)
    display(render(m, step))
endKeyword Arguments
- display::AbstractDisplay: the display to use for the first argument to the- displayfunction. If this is- nothing,- display(...)will be called without an- AbstractDisplayargument.
- render_kwargs::NamedTuple: keyword arguments for- POMDPTools.render(...)
- max_fps::Number=10: maximum number of frames to be displayed per second -- sleepwill be used to skip extra time, so this is not designed for high precision
- predisplay::Function: function to call before every call to- display(...). The only argument to this function will be the display (if it is specified) or- nothing
- extra_initial::Bool=false: if- true, display an extra step at the beginning with only elements- t,- sp, and- bpfor POMDPs (this can be useful to see the initial state if- renderdisplays only- spand not- s).
- extra_final::Bool=true- : iftrue- , display an extra step at the end with only elementst- ,done- ,s- , andb- for POMDPs (this can be useful to see the final state ifrender- displays onlys- and notsp`).
- max_steps::Integer: maximum number of steps to run for
- spec::NTuple{Symbol}: specification of what step elements to display (see- eachstep)
- rng::AbstractRNG: random number generator
See the POMDPSimulators documentation for more tips about using specific displays.
Display-specific tips
The following tips may be helpful when using particular displays.
Jupyter notebooks
By default, in a Jupyter notebook, the visualizations of all steps are displayed in the output box one after another. To make the output animated instead, where the image is overwritten at each step, one may use
DisplaySimulator(predisplay=(d)->IJulia.clear_output(true))ElectronDisplay
By default, ElectronDisplay will open a new window for each new step. To prevent this, use
ElectronDisplay.CONFIG.single_window = true