Histories
The results produced by HistoryRecorder
s and the sim
function are contained in SimHistory
objects.
POMDPSimulators.SimHistory
— TypeSimHistory
An (PO)MDP simulation history returned by simulate(::HistoryRecorder, ::Union{MDP,POMDP},...)
.
This is an AbstractVector
of NamedTuples
containing the states, actions, etc.
Examples
hist[1][:s] # returns the first state in the history
hist[:a] # returns all of the actions in the history
Examples
using POMDPSimulators, POMDPs, POMDPModels, POMDPPolicies
hr = HistoryRecorder(max_steps=10)
hist = simulate(hr, BabyPOMDP(), FunctionPolicy(x->true))
step = hist[1] # all information available about the first step
step[:s] # the first state
step[:a] # the first action
To see everything available in a step, use
keys(first(hist))
The entire history of each variable is available by using a Symbol
instead of an index, i.e.
hist[:s]
will return a vector of the starting states for each step (note the difference between :s
and :sp
).
eachstep
The eachstep
function may also be useful:
POMDPSimulators.eachstep
— Functionfor t in eachstep(hist, [spec])
...
end
Iterate through the steps in SimHistory
hist
. spec
is a tuple of symbols or string that controls what is returned for each step.
For example,
for (s, a, r, sp) in eachstep(h, "(s, a, r, sp)")
println("reward $r received when state $sp was reached after action $a was taken in state $s")
end
returns the start state, action, reward and destination state for each step of the simulation.
Alternatively, instead of expanding the steps implicitly, the elements of the step can be accessed as fields (since each step is a NamedTuple
):
for step in eachstep(h, "(s, a, r, sp)")
println("reward $(step.r) received when state $(step.sp) was reached after action $(step.a) was taken in state $(step.s)")
end
The possible valid elements in the iteration specification are
- Any node in the (PO)MDP Dynamic Decision network (by default
:s
,:a
,:sp
,:o
,:r
) b
- the initial belief in the step (for POMDPs only)bp
- the belief after being updated based ono
(for POMDPs only)action_info
- info from the policy decision (fromaction_info
)update_info
- info from the belief update (fromupdate_info
)t
- the timestep index
Examples:
collect(eachstep(h, "a,o"))
will produce a vector of action-observation named tuples.
collect(norm(sp-s) for (s,sp) in eachstep(h, "s,sp"))
will produce a vector of the distances traveled on each step (assuming the state is a Euclidean vector).
Notes
- The iteration specification can be specified as a tuple of symbols (e.g.
(:s, :a)
) instead of a string. - For type stability in performance-critical code, one should construct an iterator directly using
HistoryIterator{typeof(h), (:a,:r)}(h)
rather thaneachstep(h, "ar")
.
Other Functions
state_hist(h)
, action_hist(h)
, observation_hist(h)
belief_hist(h)
, and reward_hist(h)
will return vectors of the states, actions, and rewards, and undiscounted_reward(h)
and discounted_reward(h)
will return the total rewards collected over the trajectory. n_steps(h)
returns the number of steps in the history. exception(h)
and backtrace(h)
can be used to hold an exception if the simulation failed to finish.
view(h, range)
(e.g. view(h, 1:n_steps(h)-4)
) can be used to create a view of the history object h
that only contains a certain range of steps. The object returned by view
is an AbstractSimHistory
that can be iterated through and manipulated just like a complete SimHistory
.