Implemented Simulators

POMDPTools contains a collection of POMDPs.jl simulators.

Usage examples can be found in the simulation tutorial in the POMDPExamples package.

If you are just getting started, probably the easiest way to begin is the stepthrough function. Otherwise, consult the Which Simulator Should I Use? guide below:

Which Simulator Should I Use?

The simulators in this package provide interaction with simulations of MDP and POMDP environments from a variety of perspectives. Use these questions to choose the best simulator to suit your needs.

I want to run fast rollout simulations and get the discounted reward.

Use the Rollout Simulator.

I want to evaluate performance with many parallel Monte Carlo simulations.

Use the Parallel Simulator.

I want to closely examine the histories of states, actions, etc. produced by simulations.

Use the History Recorder.

I want to step through each individual step of a simulation.

Use the stepthrough function.

I want to visualize a simulation.

Use the DisplaySimulator.

Also see the POMDPGifs package for creating gif animations.

I want to interact with a MDP or POMDP environment from the policy's perspective

Use the sim function.

Stepping through

The stepthrough function exposes a simulation as an iterator so that the steps can be iterated through with a for loop syntax as follows:

pomdp = BabyPOMDP()
policy = RandomPolicy(pomdp)

for (s, a, o, r) in stepthrough(pomdp, policy, "s,a,o,r", max_steps=10)
    println("in state $s")
    println("took action $a")
    println("received observation $o and reward $r")
end

More examples can be found in the POMDPExamples Package.

POMDPTools.Simulators.stepthroughFunction
stepthrough(problem, policy, [spec])
stepthrough(problem, policy, [spec], [rng=rng], [max_steps=max_steps])
stepthrough(mdp::MDP, policy::Policy, [init_state], [spec]; [kwargs...])
stepthrough(pomdp::POMDP, policy::Policy, [up::Updater, [initial_belief, [initial_state]]], [spec]; [kwargs...])

Create a simulation iterator. This is intended to be used with for loop syntax to output the results of each step as the simulation is being run.

Example:

pomdp = BabyPOMDP()
policy = RandomPolicy(pomdp)

for (s, a, o, r) in stepthrough(pomdp, policy, "s,a,o,r", max_steps=10)
    println("in state $s")
    println("took action $a")
    println("received observation $o and reward $r")
end

The optional spec argument can be a string, tuple of symbols, or single symbol and follows the same pattern as eachstep called on a SimHistory object.

Under the hood, this function creates a StepSimulator with spec and returns a [PO]MDPSimIterator by calling simulate with all of the arguments except spec. All keyword arguments are passed to the StepSimulator constructor.

source

The StepSimulator contained in this file can provide the same functionality with the following syntax:

sim = StepSimulator("s,a,r,sp")
for (s,a,r,sp) in simulate(sim, problem, policy)
    # do something
end

Rollouts

RolloutSimulator is the simplest MDP or POMDP simulator. When simulate is called, it simply simulates a single trajectory of the process and returns the discounted reward.

rs = RolloutSimulator()
mdp = GridWorld()
policy = RandomPolicy(mdp)

r = simulate(rs, mdp, policy)

More examples can be found in the POMDPExamples Package

POMDPTools.Simulators.RolloutSimulatorType
RolloutSimulator(rng, max_steps)
RolloutSimulator(; <keyword arguments>)

A fast simulator that just returns the reward

The simulation will be terminated when either

  1. a terminal state is reached (as determined by isterminal() or
  2. the discount factor is as small as eps or
  3. max_steps have been executed

Keyword arguments:

  • rng::AbstractRNG (default: Random.default_rng()) - A random number generator to use.
  • eps::Float64 (default: 0.0) - A small number; if γᵗ where γ is the discount factor and t is the time step becomes smaller than this, the simulation will be terminated.
  • max_steps::Int (default: typemax(Int)) - The maximum number of steps to simulate.

Usage (optional arguments in brackets):

ro = RolloutSimulator()
history = simulate(ro, pomdp, policy, [updater [, init_belief [, init_state]]])

See also: HistoryRecorder, run_parallel

source

History Recorder

A HistoryRecorder runs a simulation and records the trajectory. It returns an AbstractVector of NamedTuples - see Histories for more info.

hr = HistoryRecorder(max_steps=100)
pomdp = TigerPOMDP()
policy = RandomPolicy(pomdp)

h = simulate(hr, pomdp, policy)

More examples can be found in the POMDPExamples Package.

POMDPTools.Simulators.HistoryRecorderType

A simulator that records the history for later examination

The simulation will be terminated when either

  1. a terminal state is reached (as determined by isterminal() or
  2. the discount factor is as small as eps or
  3. max_steps have been executed

Keyword Arguments: - rng: The random number generator for the simulation - capture_exception::Bool: whether to capture an exception and store it in the history, or let it go uncaught, potentially killing the script - show_progress::Bool: show a progress bar for the simulation - eps - max_steps

Usage (optional arguments in brackets):

hr = HistoryRecorder()
history = simulate(hr, pomdp, policy, [updater [, init_belief [, init_state]]])
source

sim()

The sim function provides a convenient way to interact with a POMDP or MDP environment and return a history. The first argument is a function that is called at every time step and takes a state (in the case of an MDP) or an observation (in the case of a POMDP) as the argument and then returns an action. The second argument is a pomdp or mdp. It is intended to be used with Julia's do syntax as follows:

pomdp = TigerPOMDP()
history = sim(pomdp, max_steps=10) do obs
    println("Observation was $obs.")
    return TIGER_OPEN_LEFT
end

This allows a flexible and general way to interact with a POMDP environment without creating new Policy types.

In the POMDP case, an updater can optionally be supplied as an additional positional argument if the policy function works with beliefs rather than directly with observations.

More examples can be found in the POMDPExamples Package

More examples can be found in the POMDPExamples Package

POMDPTools.Simulators.simFunction
sim(polfunc::Function, mdp::MDP; [<keyword arguments>])
sim(polfunc::Function, pomdp::POMDP; [<keyword arguments>])

Alternative way of running a simulation with a function specifying how to calculate the action at each timestep.

Usage

sim(mdp) do s
    # code that calculates action `a` based on `s` - this is the policy
    # you can also do other things like display something
    return a
end

for an MDP or

sim(pomdp) do o
    # code that calculates 'a' based on observation `o`
    # optionally you could save 'o' in a global variable or do a belief update
    return a
end

or with a POMDP

sim(pomdp, updater) do b
    # code that calculates 'a' based on belief `b`
    # `b` is calculated by `updater`
    return a
end

for a POMDP and a belief updater.

Keyword Arguments

All Versions

  • initialstate: the initial state for the simulation
  • simulator: keyword argument to specify any simulator to run the simulation. If nothing is specified for the simulator, a HistoryRecorder will be used as the simulator, with all keyword arguments forwarded to it, e.g.
    sim(mdp, max_steps=100, show_progress=true) do s
        # ...
    end
    will limit the simulation to 100 steps.

POMDP version

  • initialobs: this will control the initial observation given to the policy function. If this is not defined, rand(initialobs(m, s)) will be used if it is available. If it is not, missing will be used.

POMDP and updater version

  • initialbelief: initialize_belief(updater, initialbelief) is the first belief that will be given to the policy function.
source

Histories

The results produced by HistoryRecorders and the sim function are contained in SimHistory objects.

POMDPTools.Simulators.SimHistoryType
SimHistory

An (PO)MDP simulation history returned by simulate(::HistoryRecorder, ::Union{MDP,POMDP},...).

This is an AbstractVector of NamedTuples containing the states, actions, etc.

Examples

hist[1][:s] # returns the first state in the history
hist[:a] # returns all of the actions in the history
source

Examples

using POMDPs, POMDPTools, POMDPModels
hr = HistoryRecorder(max_steps=10)
hist = simulate(hr, BabyPOMDP(), FunctionPolicy(x->true))
step = hist[1] # all information available about the first step
step[:s] # the first state
step[:a] # the first action

To see everything available in a step, use

keys(first(hist))

The entire history of each variable is available by using a Symbol instead of an index, i.e.

hist[:s]

will return a vector of the starting states for each step (note the difference between :s and :sp).

eachstep

The eachstep function may also be useful:

POMDPTools.Simulators.eachstepFunction
for t in eachstep(hist, [spec])
    ...
end

Iterate through the steps in SimHistory hist. spec is a tuple of symbols or string that controls what is returned for each step.

For example,

for (s, a, r, sp) in eachstep(h, "(s, a, r, sp)")    
    println("reward $r received when state $sp was reached after action $a was taken in state $s")
end

returns the start state, action, reward and destination state for each step of the simulation.

Alternatively, instead of expanding the steps implicitly, the elements of the step can be accessed as fields (since each step is a NamedTuple):

for step in eachstep(h, "(s, a, r, sp)")    
    println("reward $(step.r) received when state $(step.sp) was reached after action $(step.a) was taken in state $(step.s)")
end

The possible valid elements in the iteration specification are

  • Any node in the (PO)MDP Dynamic Decision network (by default :s, :a, :sp, :o, :r)
  • b - the initial belief in the step (for POMDPs only)
  • bp - the belief after being updated based on o (for POMDPs only)
  • action_info - info from the policy decision (from action_info)
  • update_info - info from the belief update (from update_info)
  • t - the timestep index
source

Examples:

collect(eachstep(h, "a,o"))

will produce a vector of action-observation named tuples.

collect(norm(sp-s) for (s,sp) in eachstep(h, "s,sp"))

will produce a vector of the distances traveled on each step (assuming the state is a Euclidean vector).

Notes

  • The iteration specification can be specified as a tuple of symbols (e.g. (:s, :a)) instead of a string.
  • For type stability in performance-critical code, one should construct an iterator directly using HistoryIterator{typeof(h), (:a,:r)}(h) rather than eachstep(h, "ar").

Other Functions

state_hist(h), action_hist(h), observation_hist(h) belief_hist(h), and reward_hist(h) will return vectors of the states, actions, and rewards, and undiscounted_reward(h) and discounted_reward(h) will return the total rewards collected over the trajectory. n_steps(h) returns the number of steps in the history. exception(h) and backtrace(h) can be used to hold an exception if the simulation failed to finish.

view(h, range) (e.g. view(h, 1:n_steps(h)-4)) can be used to create a view of the history object h that only contains a certain range of steps. The object returned by view is an AbstractSimHistory that can be iterated through and manipulated just like a complete SimHistory.

Parallel

POMDPTools contains a utility for running many Monte Carlo simulations in parallel to evaluate performance. The basic workflow involves the following steps:

  1. Create a vector of Sim objects, each specifying how a single simulation should be run.
  2. Use the run_parallel or run function to run the simulations.
  3. Analyze the results of the simulations contained in the DataFrame returned by run_parallel.

Example

An example can be found in the POMDPExamples Package.

Sim objects

Each simulation should be specified by a Sim object which contains all the information needed to run a simulation, including the Simulator, POMDP or MDP, Policy, Updater, and any other ingredients.

POMDPTools.Simulators.SimType
Sim(m::MDP, p::Policy[, initialstate]; kwargs...)
Sim(m::POMDP, p::Policy[, updater[, initial_belief[, initialstate]]]; kwargs...)

Create a Sim object that contains everything needed to run and record a single simulation, including model, initial conditions, and metadata.

A vector of Sim objects can be executed with run or run_parallel.

Keyword Arguments

  • rng::AbstractRNG=Random.default_rng()
  • max_steps::Int=typemax(Int)
  • simulator::Simulator=HistoryRecorder(rng=rng, max_steps=max_steps)
  • metadata::NamedTuple a named tuple (or dictionary) of metadata for the sim that will be recorded, e.g.(solver_iterations=500,)`.
source

Running simulations

The simulations are actually carried out by the run and run_parallel functions.

POMDPTools.Simulators.run_parallelFunction
run_parallel(queue::Vector{Sim})
run_parallel(f::Function, queue::Vector{Sim})

Run Sim objects in queue in parallel and return results as a DataFrame.

By default, the DataFrame will contain the reward for each simulation and the metadata provided to the sim.

Arguments

  • queue: List of Sim objects to be executed
  • f: Function to process the results of each simulation

This function should take two arguments, (1) the Sim that was executed and (2) the result of the simulation, by default a SimHistory. It should return a named tuple that will appear in the dataframe. See Examples below.

Keyword Arguments

  • show_progress::Bool: whether or not to show a progress meter
  • progress::ProgressMeter.Progress: determines how the progress meter is displayed

Examples

run_parallel(queue) do sim, hist
    return (n_steps=n_steps(hist), reward=discounted_reward(hist))
end

will return a dataframe with with the number of steps and the reward in it.

source

The run function is also provided to run simulations in serial (this is often useful for debugging). Note that the documentation below also contains a section for the builtin julia run function, even though it is not relevant here.

Base.runFunction
run(queue::Vector{Sim})
run(f::Function, queue::Vector{Sim})

Run the Sim objects in queue on a single process and return the results as a dataframe.

See run_parallel for more information.

source

Specifying information to be recorded

By default, only the discounted rewards from each simulation are recorded, but arbitrary information can be recorded.

The run_parallel and run functions accept a function (normally specified via the do syntax) that takes the Sim object and history of the simulation and extracts relevant statistics as a named tuple. For example, if the desired characteristics are the number of steps in the simulation and the reward, run_parallel would be invoked as follows:

df = run_parallel(queue) do sim::Sim, hist::SimHistory
    return (n_steps=n_steps(hist), reward=discounted_reward(hist))
end

These statistics are combined into a DataFrame, with each line representing a single simulation, allowing for statistical analysis. For example,

mean(df[:reward]./df[:n_steps])

would compute the average reward per step with each simulation weighted equally regardless of length.

Display

DisplaySimulator

The DisplaySimulator displays each step of a simulation in real time through a multimedia display such as a Jupyter notebook or ElectronDisplay. Specifically it uses POMDPTools.render and the built-in Julia display function to visualize each step.

Example:

using POMDPs
using POMDPModels
using POMDPTools
using ElectronDisplay
ElectronDisplay.CONFIG.single_window = true

ds = DisplaySimulator()
m = SimpleGridWorld()
simulate(ds, m, RandomPolicy(m))
POMDPTools.Simulators.DisplaySimulatorType
DisplaySimulator(;kwargs...)

Create a simulator that displays each step of a simulation.

Given a POMDP or MDP model m, this simulator roughly works like

for step in stepthrough(m, ...)
    display(render(m, step))
end

Keyword Arguments

  • display::AbstractDisplay: the display to use for the first argument to the display function. If this is nothing, display(...) will be called without an AbstractDisplay argument.
  • render_kwargs::NamedTuple: keyword arguments for POMDPTools.render(...)
  • max_fps::Number=10: maximum number of frames to be displayed per second - sleep will be used to skip extra time, so this is not designed for high precision
  • predisplay::Function: function to call before every call to display(...). The only argument to this function will be the display (if it is specified) or nothing
  • extra_initial::Bool=false: if true, display an extra step at the beginning with only elements t, sp, and bp for POMDPs (this can be useful to see the initial state if render displays only sp and not s).
  • extra_final::Bool=true: iftrue, display an extra step at the end with only elementst,done,s, andbfor POMDPs (this can be useful to see the final state ifrenderdisplays onlysand notsp`).
  • max_steps::Integer: maximum number of steps to run for
  • spec::NTuple{Symbol}: specification of what step elements to display (see eachstep)
  • rng::AbstractRNG: random number generator

See the POMDPSimulators documentation for more tips about using specific displays.

source

Display-specific tips

The following tips may be helpful when using particular displays.

Jupyter notebooks

By default, in a Jupyter notebook, the visualizations of all steps are displayed in the output box one after another. To make the output animated instead, where the image is overwritten at each step, one may use

DisplaySimulator(predisplay=(d)->IJulia.clear_output(true))

ElectronDisplay

By default, ElectronDisplay will open a new window for each new step. To prevent this, use

ElectronDisplay.CONFIG.single_window = true