Double Progressive Widening

The double progressive widening DPW solver is useful for problems with large (e.g. continuous) state and action spaces. It gradually expands the tree's branching factor so that the algorithm explores deeply even with large spaces.

See the papers at https://hal.archives-ouvertes.fr/file/index/docid/542673/filename/c0mcts.pdf and http://arxiv.org/abs/1405.5498 for a description.

The solver fields are used to specify solver parameters. All of them can be specified as keyword arguments to the solver constructor.

# MCTS.DPWSolver — Type.

MCTS solver with DPW

Fields:

depth::Int64:
    Maximum rollout horizon and tree depth.
    default: 10

exploration_constant::Float64:
    Specified how much the solver should explore.
    In the UCB equation, Q + c*sqrt(log(t/N)), c is the exploration constant.
    default: 1.0

n_iterations::Int64
    Number of iterations during each action() call.
    default: 100

max_time::Float64
    Maximum amount of CPU time spent iterating through simulations.
    default: Inf

k_action::Float64
alpha_action::Float64
k_state::Float64
alpha_state::Float64
    These constants control the double progressive widening. A new state
    or action will be added if the number of children is less than or equal to kN^alpha.
    defaults: k:10, alpha:0.5

keep_tree::Bool
    If true, store the tree in the planner for reuse at the next timestep (and every time it is used in the future). There is a computational cost for maintaining the state dictionary necessary for this.
    default: false

enable_action_pw::Bool
    If true, enable progressive widening on the action space; if false just use the whole action space.
    default: true

check_repeat_state::Bool
check_repeat_action::Bool
    When constructing the tree, check whether a state or action has been seen before (there is a computational cost to maintaining the dictionaries necessary for this)
    default: true

tree_in_info::Bool:
    If true, return the tree in the info dict when action_info is called. False by default because it can use a lot of memory if histories are being saved.
    default: false

rng::AbstractRNG:
    Random number generator

estimate_value::Any (rollout policy)
    Function, object, or number used to estimate the value at the leaf nodes.
    If this is a function `f`, `f(mdp, s, depth)` will be called to estimate the value.
    If this is an object `o`, `estimate_value(o, mdp, s, depth)` will be called.
    If this is a number, the value will be set to that number.
    default: RolloutEstimator(RandomSolver(rng))

init_Q::Any
    Function, object, or number used to set the initial Q(s,a) value at a new node.
    If this is a function `f`, `f(mdp, s, a)` will be called to set the value.
    If this is an object `o`, `init_Q(o, mdp, s, a)` will be called.
    If this is a number, Q will always be set to that number.
    default: 0.0

init_N::Any
    Function, object, or number used to set the initial N(s,a) value at a new node.
    If this is a function `f`, `f(mdp, s, a)` will be called to set the value.
    If this is an object `o`, `init_N(o, mdp, s, a)` will be called.
    If this is a number, N will always be set to that number.
    default: 0

next_action::Any
    Function or object used to choose the next action to be considered for progressive widening.
    The next action is determined based on the MDP, the state, `s`, and the current `DPWStateNode`, `snode`.
    If this is a function `f`, `f(mdp, s, snode)` will be called to set the value.
    If this is an object `o`, `next_action(o, mdp, s, snode)` will be called.
    default: RandomActionGenerator(rng)

default_action::Any
    Function, action, or Policy used to determine the action if POMCP fails with exception `ex`.
    If this is a Function `f`, `f(pomdp, belief, ex)` will be called.
    If this is a Policy `p`, `action(p, belief)` will be called.
    If it is an object `a`, `default_action(a, pomdp, belief, ex)` will be called, and if this method is not implemented, `a` will be returned directly.
    default: `ExceptionRethrow()`

source