sim()

The sim function provides a convenient way to interact with a POMDP or MDP environment and return a history. The first argument is a function that is called at every time step and takes a state (in the case of an MDP) or an observation (in the case of a POMDP) as the argument and then returns an action. The second argument is a pomdp or mdp. It is intended to be used with Julia's do syntax as follows:

pomdp = TigerPOMDP()
history = sim(pomdp, max_steps=10) do obs
    println("Observation was $obs.")
    return TIGER_OPEN_LEFT
end

This allows a flexible and general way to interact with a POMDP environment without creating new Policy types.

In the POMDP case, an updater can optionally be supplied as an additional positional argument if the policy function works with beliefs rather than directly with observations.

More examples can be found in the POMDPExamples Package

More examples can be found in the POMDPExamples Package

POMDPSimulators.simFunction
sim(polfunc::Function, mdp::MDP; [<keyword arguments>])
sim(polfunc::Function, pomdp::POMDP; [<keyword arguments>])

Alternative way of running a simulation with a function specifying how to calculate the action at each timestep.

Usage

sim(mdp) do s
    # code that calculates action `a` based on `s` - this is the policy
    # you can also do other things like display something
    return a
end

for an MDP or

sim(pomdp) do o
    # code that calculates 'a' based on observation `o`
    # optionally you could save 'o' in a global variable or do a belief update
    return a
end

or with a POMDP

sim(pomdp, updater) do b
    # code that calculates 'a' based on belief `b`
    # `b` is calculated by `updater`
    return a
end

for a POMDP and a belief updater.

Keyword Arguments

All Versions

  • initialstate: the initial state for the simulation
  • simulator: keyword argument to specify any simulator to run the simulation. If nothing is specified for the simulator, a HistoryRecorder will be used as the simulator, with all keyword arguments forwarded to it, e.g.
    sim(mdp, max_steps=100, show_progress=true) do s
        # ...
    end
    will limit the simulation to 100 steps.

POMDP version

  • initialobs: this will control the initial observation given to the policy function. If this is not defined, rand(initialobs(m, s)) will be used if it is available. If it is not, missing will be used.

POMDP and updater version

  • initialbelief: initialize_belief(updater, initialbelief) is the first belief that will be given to the policy function.
source