History
History Recorder
HistoryRecorder runs a simulation and records the trajectory. It returns an MDPHistory or POMDPHistory (see history.jl below).
hr = HistoryRecorder(max_steps=100)
pomdp = TigerPOMDP()
policy = RandomPolicy(pomdp)
h = simulate(hr, pomdp, policy)POMDPSimulators.HistoryRecorder — Type.A simulator that records the history for later examination
The simulation will be terminated when either
- a terminal state is reached (as determined by
isterminal()or - the discount factor is as small as
epsor - max_steps have been executed
Keyword Arguments: - rng: The random number generator for the simulation - capture_exception::Bool: whether to capture an exception and store it in the history, or let it go uncaught, potentially killing the script - show_progress::Bool: show a progress bar for the simulation - eps - max_steps - sizehint::Int: the expected length of the simulation (for preallocation)
Usage (optional arguments in brackets): hr = HistoryRecorder() history = simulate(hr, pomdp, policy, [updater [, initbelief [, initstate]]])
SimHistory
POMDPSimulators.eachstep — Function.for t in eachstep(hist, [spec])
...
endIterate through the steps in SimHistory hist. spec is a tuple of symbols or string that controls what is returned for each step.
For example,
for (s, a, r, sp) in eachstep(h, "(s, a, r, sp)")
println("reward $r received when state $sp was reached after action $a was taken in state $s")
endreturns the start state, action, reward and destination state for each step of the simulation.
The possible valid elements in the iteration specification are
s- the initial state in a stepb- the initial belief in the step (for POMDPs only)a- the action taken in the stepr- the reward received for the stepsp- the final state at the end of the step (s')o- the observation received during the step (note that this is usually based onspinstead ofs)bp- the belief after being updated based ono(for POMDPs only)i- info from the state transition (fromgenerate_srifor MDPs orgenerate_sorifor POMDPs)ai- info from the policy decision (fromaction_info)ui- info from the belief update (fromupdate_info)t- the timestep index
Examples:
collect(eachstep(h, "ao"))will produce a vector of action-observation tuples.
collect(norm(sp-s) for (s,sp) in eachstep(h, "s,sp"))will produce a vector of the distances traveled on each step (assuming the state is a Euclidean vector).
Notes:
- The iteration specification can be specified as a tuple of symbols (e.g.
(:s, :a)) instead of a string. - For type stability in performance-critical code, one should construct an iterator directly using
HistoryIterator{typeof(h), (:a,:r)}(h)rather thaneachstep(h, "ar").
state_hist(h), action_hist(h), observation_hist(h) belief_hist(h), and reward_hist(h) will return vectors of the states, actions, and rewards, and undiscounted_reward(h) and discounted_reward(h) will return the total rewards collected over the trajectory. n_steps(h) returns the number of steps in the history. exception(h) and backtrace(h) can be used to hold an exception if the simulation failed to finish.
view(h, range) (e.g. view(h, 1:n_steps(h)-4)) can be used to create a view of the history object h that only contains a certain range of steps. The object returned by view is a SimHistory that can be iterated through and manipulated just like a complete SimHistory.
MDPHistory
An MDPHistory represents a state-action-reward history from simulating an MDP. Subtype of SimHistory
POMDPSimulators.MDPHistory — Type.An object that contains a MDP simulation history
Returned by simulate when called with a HistoryRecorder. Iterate through the (s, a, r, s') tuples in MDPHistory h like this:
for (s, a, r, sp) in eachstep(h)
# do something
endPOMDPHistory
A POMDPHistory contains a record of the states, actions, observations, rewards, and beliefs encountered during a simulation of a POMDP. Subtype of SimHistory