Histories

The results produced by HistoryRecorders and the sim function are contained in SimHistory objects.

POMDPSimulators.SimHistoryType
SimHistory

An (PO)MDP simulation history returned by simulate(::HistoryRecorder, ::Union{MDP,POMDP},...).

This is an AbstractVector of NamedTuples containing the states, actions, etc.

Examples

hist[1][:s] # returns the first state in the history
hist[:a] # returns all of the actions in the history
source

Examples

using POMDPSimulators, POMDPs, POMDPModels, POMDPPolicies
hr = HistoryRecorder(max_steps=10)
hist = simulate(hr, BabyPOMDP(), FunctionPolicy(x->true))
step = hist[1] # all information available about the first step
step[:s] # the first state
step[:a] # the first action

To see everything available in a step, use

keys(first(hist))

The entire history of each variable is available by using a Symbol instead of an index, i.e.

hist[:s]

will return a vector of the starting states for each step (note the difference between :s and :sp).

eachstep

The eachstep function may also be useful:

POMDPSimulators.eachstepFunction
for t in eachstep(hist, [spec])
    ...
end

Iterate through the steps in SimHistory hist. spec is a tuple of symbols or string that controls what is returned for each step.

For example,

for (s, a, r, sp) in eachstep(h, "(s, a, r, sp)")    
    println("reward $r received when state $sp was reached after action $a was taken in state $s")
end

returns the start state, action, reward and destination state for each step of the simulation.

Alternatively, instead of expanding the steps implicitly, the elements of the step can be accessed as fields (since each step is a NamedTuple):

for step in eachstep(h, "(s, a, r, sp)")    
    println("reward $(step.r) received when state $(step.sp) was reached after action $(step.a) was taken in state $(step.s)")
end

The possible valid elements in the iteration specification are

  • Any node in the (PO)MDP Dynamic Decision network (by default :s, :a, :sp, :o, :r)
  • b - the initial belief in the step (for POMDPs only)
  • bp - the belief after being updated based on o (for POMDPs only)
  • action_info - info from the policy decision (from action_info)
  • update_info - info from the belief update (from update_info)
  • t - the timestep index
source

Examples:

collect(eachstep(h, "a,o"))

will produce a vector of action-observation named tuples.

collect(norm(sp-s) for (s,sp) in eachstep(h, "s,sp"))

will produce a vector of the distances traveled on each step (assuming the state is a Euclidean vector).

Notes

  • The iteration specification can be specified as a tuple of symbols (e.g. (:s, :a)) instead of a string.
  • For type stability in performance-critical code, one should construct an iterator directly using HistoryIterator{typeof(h), (:a,:r)}(h) rather than eachstep(h, "ar").

Other Functions

state_hist(h), action_hist(h), observation_hist(h) belief_hist(h), and reward_hist(h) will return vectors of the states, actions, and rewards, and undiscounted_reward(h) and discounted_reward(h) will return the total rewards collected over the trajectory. n_steps(h) returns the number of steps in the history. exception(h) and backtrace(h) can be used to hold an exception if the simulation failed to finish.

view(h, range) (e.g. view(h, 1:n_steps(h)-4)) can be used to create a view of the history object h that only contains a certain range of steps. The object returned by view is an AbstractSimHistory that can be iterated through and manipulated just like a complete SimHistory.