MCVI
The Monte Carlo Value Iteration (MCVI) offline solver for POMDPs.jl
.
Described in
Bai, H., Hsu, D., & Lee, W. S. (2014). Integrated perception and planning in the continuous space: A POMDP approach. The International Journal of Robotics Research, 33(9), 1288-1302.
Installation
using Pkg
Pkg.add("MCVI")
Example
using POMDPs
using POMDPModels
using MCVI
using Random
mutable struct LightDark1DLowerBound
rng::AbstractRNG
end
mutable struct LightDark1DUpperBound
rng::AbstractRNG
end
function MCVI.init_lower_action(p::LightDark1D)
return 0
end
function MCVI.lower_bound(lb::LightDark1DLowerBound, p::LightDark1D, s::LightDark1DState)
r = @gen(:r)(p, s, MCVI.init_lower_action(p), lb.rng)
return r * discount(p)
end
function MCVI.upper_bound(ub::LightDark1DUpperBound, p::LightDark1D, s::LightDark1DState)
steps = abs(s.y)/p.step_size + 1
return p.correct_r*(discount(p)^steps)
end
prob = LightDark1D()
sim = MCVISimulator(rng=MersenneTwister(1))
solver = MCVISolver(sim, nothing, 1, 100, 8, 500, 1000, 5000, 50, LightDark1DLowerBound(sim.rng), LightDark1DUpperBound(sim.rng))
println("Solving...")
policy = solve(solver, prob)
println("Solved!")
up = updater(policy)
reward = simulate(MCVISimulator(rng=MersenneTwister(1)), prob, policy, up, up.root)
println("Reward: ", reward)
# output
Solving...
Gap closed!
Solved!
Reward: 5.314410000000001
Documentation
MCVI.MCVISolver
— TypeMCVISolver <: POMDPs.Solver
The MCVI solver is a solver for POMDPs that uses Monte Carlo Value Iteration to solve the problem. Described in Bai, H., Hsu, D., & Lee, W. S. (2014). Integrated perception and planning in the continuous space: A POMDP approach. The International Journal of Robotics Research, 33(9), 1288-1302.
Fields
simulater::POMDPs.Simulator
root::Union{BeliefNode, Nothing}
n_iter::Int64
: Number of iterationsnum_particles::Int64
: Number of belief particles to be usedobs_branch::Int64
: Branching factor (previous default: 8)num_state::Int64
: Number of states to sample from belief (previous default: 500)num_prune_obs::Int64
: Number of times to sample observation while pruning alpha edges (previous default: 1000)num_eval_belief::Int64
: Number of times to simulate while evaluating belief (previous default: 5000)num_obs::Int64
: Number of observations to sample while evaluating belief (previous default: 50)lbound::Any
: An object representing the lower bound. The functionMCVI.lower_bound(lbound, problem, s)
will be called to get the lower bound for the states
- this function needs to be implemented for the solver to work.ubound::Any
: An object representing the upper bound. The functionMCVI.upper_bound(ubound, problem, s)
will be called to get the lower bound for the states
- this function needs to be implemented for the solver to work.
Reference the docs for an example of bounds implemented for the Light Dark problem.