Histories
The results produced by HistoryRecorders and the sim function are contained in SimHistory objects.
POMDPSimulators.SimHistory — TypeSimHistoryAn (PO)MDP simulation history returned by simulate(::HistoryRecorder, ::Union{MDP,POMDP},...).
This is an AbstractVector of NamedTuples containing the states, actions, etc.
Examples
hist[1][:s] # returns the first state in the historyhist[:a] # returns all of the actions in the historyExamples
using POMDPSimulators, POMDPs, POMDPModels, POMDPPolicies
hr = HistoryRecorder(max_steps=10)
hist = simulate(hr, BabyPOMDP(), FunctionPolicy(x->true))
step = hist[1] # all information available about the first step
step[:s] # the first state
step[:a] # the first actionTo see everything available in a step, use
keys(first(hist))The entire history of each variable is available by using a Symbol instead of an index, i.e.
hist[:s]will return a vector of the starting states for each step (note the difference between :s and :sp).
eachstep
The eachstep function may also be useful:
POMDPSimulators.eachstep — Functionfor t in eachstep(hist, [spec])
    ...
endIterate through the steps in SimHistory hist. spec is a tuple of symbols or string that controls what is returned for each step.
For example,
for (s, a, r, sp) in eachstep(h, "(s, a, r, sp)")    
    println("reward $r received when state $sp was reached after action $a was taken in state $s")
endreturns the start state, action, reward and destination state for each step of the simulation.
Alternatively, instead of expanding the steps implicitly, the elements of the step can be accessed as fields (since each step is a NamedTuple):
for step in eachstep(h, "(s, a, r, sp)")    
    println("reward $(step.r) received when state $(step.sp) was reached after action $(step.a) was taken in state $(step.s)")
endThe possible valid elements in the iteration specification are
- Any node in the (PO)MDP Dynamic Decision network (by default :s,:a,:sp,:o,:r)
- b- the initial belief in the step (for POMDPs only)
- bp- the belief after being updated based on- o(for POMDPs only)
- action_info- info from the policy decision (from- action_info)
- update_info- info from the belief update (from- update_info)
- t- the timestep index
Examples:
collect(eachstep(h, "a,o"))will produce a vector of action-observation named tuples.
collect(norm(sp-s) for (s,sp) in eachstep(h, "s,sp"))will produce a vector of the distances traveled on each step (assuming the state is a Euclidean vector).
Notes
- The iteration specification can be specified as a tuple of symbols (e.g. (:s, :a)) instead of a string.
- For type stability in performance-critical code, one should construct an iterator directly using HistoryIterator{typeof(h), (:a,:r)}(h)rather thaneachstep(h, "ar").
Other Functions
state_hist(h), action_hist(h), observation_hist(h) belief_hist(h), and reward_hist(h) will return vectors of the states, actions, and rewards, and undiscounted_reward(h) and discounted_reward(h) will return the total rewards collected over the trajectory. n_steps(h) returns the number of steps in the history. exception(h) and backtrace(h) can be used to hold an exception if the simulation failed to finish.
view(h, range) (e.g. view(h, 1:n_steps(h)-4)) can be used to create a view of the history object h that only contains a certain range of steps. The object returned by view is an AbstractSimHistory that can be iterated through and manipulated just like a complete SimHistory.