MCVI

CI codecov.io

The Monte Carlo Value Iteration (MCVI) offline solver for POMDPs.jl.

Described in

Bai, H., Hsu, D., & Lee, W. S. (2014). Integrated perception and planning in the continuous space: A POMDP approach. The International Journal of Robotics Research, 33(9), 1288-1302.

Installation

using Pkg
Pkg.add("MCVI")

Example

using POMDPs
using POMDPModels
using MCVI
using Random

mutable struct LightDark1DLowerBound
    rng::AbstractRNG
end

mutable struct LightDark1DUpperBound
    rng::AbstractRNG
end

function MCVI.init_lower_action(p::LightDark1D)
    return 0
end

function MCVI.lower_bound(lb::LightDark1DLowerBound, p::LightDark1D, s::LightDark1DState)
    r = @gen(:r)(p, s, MCVI.init_lower_action(p), lb.rng)
    return r * discount(p)
end

function MCVI.upper_bound(ub::LightDark1DUpperBound, p::LightDark1D, s::LightDark1DState)
    steps = abs(s.y)/p.step_size + 1
    return p.correct_r*(discount(p)^steps)
end

prob = LightDark1D()
sim = MCVISimulator(rng=MersenneTwister(1))

solver = MCVISolver(sim, nothing, 1, 100, 8, 500, 1000, 5000, 50, LightDark1DLowerBound(sim.rng), LightDark1DUpperBound(sim.rng))

println("Solving...")
policy = solve(solver, prob)
println("Solved!")

up = updater(policy)
reward = simulate(MCVISimulator(rng=MersenneTwister(1)), prob, policy, up, up.root)
println("Reward: ", reward)

# output
Solving...
Gap closed!
Solved!
Reward: 5.314410000000001

Documentation

MCVI.MCVISolverType
MCVISolver <: POMDPs.Solver

The MCVI solver is a solver for POMDPs that uses Monte Carlo Value Iteration to solve the problem. Described in Bai, H., Hsu, D., & Lee, W. S. (2014). Integrated perception and planning in the continuous space: A POMDP approach. The International Journal of Robotics Research, 33(9), 1288-1302.

Fields

  • simulater::POMDPs.Simulator
  • root::Union{BeliefNode, Nothing}
  • n_iter::Int64: Number of iterations
  • num_particles::Int64: Number of belief particles to be used
  • obs_branch::Int64: Branching factor (previous default: 8)
  • num_state::Int64: Number of states to sample from belief (previous default: 500)
  • num_prune_obs::Int64: Number of times to sample observation while pruning alpha edges (previous default: 1000)
  • num_eval_belief::Int64: Number of times to simulate while evaluating belief (previous default: 5000)
  • num_obs::Int64: Number of observations to sample while evaluating belief (previous default: 50)
  • lbound::Any: An object representing the lower bound. The function MCVI.lower_bound(lbound, problem, s) will be called to get the lower bound for the state s - this function needs to be implemented for the solver to work.
  • ubound::Any: An object representing the upper bound. The function MCVI.upper_bound(ubound, problem, s) will be called to get the lower bound for the state s - this function needs to be implemented for the solver to work.

Reference the docs for an example of bounds implemented for the Light Dark problem.

source