Solvers
Defining a solver involves creating or using four pieces of code:
- A subtype of
Solver
that holds the parameters and configuration options for the solver. - A subtype of
Policy
that holds all of the data needed to choose actions online. - A method of
solve
that takes theSolver
and a (PO)MDP as arguments, performs all of the offline computations for solving the problem, and returns the policy. - A method of
action
that takes in the policy and a state or belief and returns an action.
In many cases, items 2 and 4 can be satisfied with an off-the-shelf Policy
from the POMDPTools package. also contains many tools that are useful for defining solvers in a robust, concise, and readable manner.
Online and Offline Solvers
Generally, solvers can be grouped into two categories: Offline solvers that do most of their computational work before interacting with the environment, and online solvers that do their work online as each new state or observation is encountered. Although offline and online solvers both use the exact same Solver
, solve
, Policy
, action
structure, the work of defining online and offline solvers is focused on different portions.
For an offline solver, most of the implementation effort will be spent on the [solve
] function, and an off-the-shelf policy from POMDPTools will typically be used.
For an online solver, the solve
function typically does little or no work, but merely creates a Policy
object that will carry out computation online. It is typical in POMDPs.jl to use the term "Planner" to name a Policy
object for an online solver that carries out a large amount of computation ("planning") at interaction time. In this case most of the effort will be focused on implementing the action
method for the "Planner" Policy
type.
Examples
Solver implementation is most clearly explained through examples. The following sections contain examples of both online and offline solver definitions: