History

History

History Recorder

HistoryRecorder runs a simulation and records the trajectory. It returns an MDPHistory or POMDPHistory (see history.jl below).

hr = HistoryRecorder(max_steps=100)
pomdp = TigerPOMDP()
policy = RandomPolicy(pomdp)

h = simulate(hr, pomdp, policy)

A simulator that records the history for later examination

The simulation will be terminated when either

  1. a terminal state is reached (as determined by isterminal() or
  2. the discount factor is as small as eps or
  3. max_steps have been executed

Keyword Arguments: - rng: The random number generator for the simulation - capture_exception::Bool: whether to capture an exception and store it in the history, or let it go uncaught, potentially killing the script - show_progress::Bool: show a progress bar for the simulation - eps - max_steps - sizehint::Int: the expected length of the simulation (for preallocation)

Usage (optional arguments in brackets): hr = HistoryRecorder() history = simulate(hr, pomdp, policy, [updater [, initbelief [, initstate]]])

source

SimHistory

for t in eachstep(hist, [spec])
    ...
end

Iterate through the steps in SimHistory hist. spec is a tuple of symbols or string that controls what is returned for each step.

For example,

for (s, a, r, sp) in eachstep(h, "(s, a, r, sp)")    
    println("reward $r received when state $sp was reached after action $a was taken in state $s")
end

returns the start state, action, reward and destination state for each step of the simulation.

The possible valid elements in the iteration specification are

  • s - the initial state in a step
  • b - the initial belief in the step (for POMDPs only)
  • a - the action taken in the step
  • r - the reward received for the step
  • sp - the final state at the end of the step (s')
  • o - the observation received during the step (note that this is usually based on sp instead of s)
  • bp - the belief after being updated based on o (for POMDPs only)
  • i - info from the state transition (from generate_sri for MDPs or generate_sori for POMDPs)
  • ai - info from the policy decision (from action_info)
  • ui - info from the belief update (from update_info)
  • t - the timestep index
source

Examples:

collect(eachstep(h, "ao"))

will produce a vector of action-observation tuples.

collect(norm(sp-s) for (s,sp) in eachstep(h, "s,sp"))

will produce a vector of the distances traveled on each step (assuming the state is a Euclidean vector).

Notes:

state_hist(h), action_hist(h), observation_hist(h) belief_hist(h), and reward_hist(h) will return vectors of the states, actions, and rewards, and undiscounted_reward(h) and discounted_reward(h) will return the total rewards collected over the trajectory. n_steps(h) returns the number of steps in the history. exception(h) and backtrace(h) can be used to hold an exception if the simulation failed to finish.

view(h, range) (e.g. view(h, 1:n_steps(h)-4)) can be used to create a view of the history object h that only contains a certain range of steps. The object returned by view is a SimHistory that can be iterated through and manipulated just like a complete SimHistory.

MDPHistory

An MDPHistory represents a state-action-reward history from simulating an MDP. Subtype of SimHistory

An object that contains a MDP simulation history

Returned by simulate when called with a HistoryRecorder. Iterate through the (s, a, r, s') tuples in MDPHistory h like this:

for (s, a, r, sp) in eachstep(h)
    # do something
end
source

POMDPHistory

A POMDPHistory contains a record of the states, actions, observations, rewards, and beliefs encountered during a simulation of a POMDP. Subtype of SimHistory