Frequently Asked Questions (FAQ)

How do I save my policies?

We recommend using JLD2 to save the whole policy object:

using JLD2
save("my_policy.jld2", "policy", policy)

There could be a number of things that are going wrong. Remeber, POMDPs can be fairly hard to work with, but don't panic. If you have a discrete POMDP or MDP and you're using a solver that requires the explicit transition probabilities (you've implemented a pdf function), the first thing to try is make sure that your probability masses sum up to unity. We've provide some tools in POMDPToolbox that can check this for you. If you have a POMDP called pomdp, you can run the checks by doing the following:

using POMDPTesting
probability_check(pomdp) # checks that both observation and transition functions give probs that sum to unity
obs_prob_consistency_check(pomdp) # checks the observation probabilities
trans_prob_consistency_check(pomdp) # check the transition probabilities

If these throw an error, you may need to fix your transition or observation functions.

Why do I need to put type assertions pomdp::POMDP into the function signature?

Specifying the type in your function signature allows Julia to call the appropriate function when your custom type is passed into it. For example if a POMDPs.jl solver calls states on the POMDP that you passed into it, the correct states function will only get dispatched if you specified that the states function you wrote works with your POMDP type. Because Julia supports multiple-dispatch, these type assertion are a way for doing object-oriented programming in Julia.

Why are all the solvers in separate modules?

We did not put all the solvers and support tools into POMDPs.jl, because we wanted POMDPs.jl to be a lightweight interface package. This has a number of advantages. The first is that if a user only wants to use a few solvers from the JuliaPOMDP organization, they do not have to install all the other solvers and their dependencies. The second advantage is that people who are not directly part of the JuliaPOMDP organization can write their own solvers without going into the source code of other solvers. This makes the framework easier to adopt and to extend.

How can I implement terminal actions?

Terminal actions are actions that cause the MDP to terminate without generating a new state. POMDPs.jl handles terminal conditions via the isterminal function on states, and does not directly support terminal actions. If your MDP has a terminal action, you need to implement the model functions accordingly to generate a terminal state. In both generative and explicit cases, you will need some dummy state, say spt, that can be recognized as terminal by the isterminal function. One way to do this is to give spt a state value that is out of bounds (e.g. a vector of NaNs or -1s) and then check for that in isterminal, so that this does not clash with any conventional termination conditions on the state.

If a terminal action is taken, regardless of current state, the transition function should return a distribution with only one next state, spt, with probability 1.0. In the generative case, the new state generated should be spt. The reward function or the r in generate_sr can be set according to the cost of the terminal action.

Why are there two versions of `reward`?

Both reward(m, s, a) and reward(m, s, a, sp) are included because of these two facts:

Some non-native solvers use reward(m, s, a)
Sometimes the reward depends on s and sp.

It is reasonable to implement both as long as the (s, a) version is the expectation of the (s, a, s') version (see below).

How do I implement `reward(m, s, a)` if the reward depends on the next state?

The solvers that require reward(m, s, a) only work on problems with finite state and action spaces. In this case, you can define reward(m, s, a) in terms of reward(m, s, a, sp) with the following code:

const rdict = Dict{Tuple{S,A}, Float64}()

for s in states(m)
  for a in actions(m)
    r = 0.0
    td = transition(m, s, a) # transition distribution for s, a
    for sp in support(td)
      r += pdf(td, sp)*reward(m, s, a, sp)
    end
    rdict[(s, a)] = r
  end
end

POMDPs.reward(m, s, a) = rdict[(s, a)]