# Model Transformations

POMDPModelTools contains several tools for transforming problems into other classes so that they can be used by different solvers.

## Linear Algebra Representations

For some algorithms, such as value iteration, it is convenient to use vectors that contain the reward for every state, and matrices that contain the transition probabilities. These can be constructed with the following functions:

`POMDPModelTools.transition_matrices`

— Function`transition_matrices(p::SparseTabularProblem)`

Accessor function for the transition model of a sparse tabular problem. It returns a list of sparse matrices for each action of the problem.

```
transition_matrices(m::Union{MDP,POMDP})
transition_matrices(m; sparse=true)
```

Construct transition matrices for (PO)MDP m.

The returned object is an associative object (usually a Dict), where the keys are actions. Each value in this object is an AbstractMatrix where the row corresponds to the state index of s and the column corresponds to the state index of s'. The entry in the matrix is the probability of transitioning from state s to state s'.

`POMDPModelTools.reward_vectors`

— Function`reward_vectors(m::Union{MDP, POMDP})`

Construct reward vectors for (PO)MDP m.

The returned object is an associative object (usually a Dict), where the keys are actions. Each value in this object is an AbstractVector where the index corresponds to the state index of s and the entry is the reward for that state.

## Sparse Tabular MDPs and POMDPs

The `SparseTabularMDP`

and `SparseTabularPOMDP`

represents discrete problems defined using the explicit interface. The transition and observation models are represented using sparse matrices. Solver writers can leverage these data structures to write efficient vectorized code. A problem writer can define its problem using the explicit interface and it can be automatically converted to a sparse tabular representation by calling the constructors `SparseTabularMDP(::MDP)`

or `SparseTabularPOMDP(::POMDP)`

. See the following docs to know more about the matrix representation and how to access the fields of the `SparseTabular`

objects:

`POMDPModelTools.SparseTabularMDP`

— Type`SparseTabularMDP`

An MDP object where states and actions are integers and the transition is represented by a list of sparse matrices. This data structure can be useful to exploit in vectorized algorithm (e.g. see SparseValueIterationSolver). The recommended way to access the transition and reward matrices is through the provided accessor functions: `transition_matrix`

and `reward_vector`

.

**Fields**

`T::Vector{SparseMatrixCSC{Float64, Int64}}`

The transition model is represented as a vector of sparse matrices (one for each action).`T[a][s, sp]`

the probability of transition from`s`

to`sp`

taking action`a`

.`R::Array{Float64, 2}`

The reward is represented as a matrix where the rows are states and the columns actions:`R[s, a]`

is the reward of taking action`a`

in sate`s`

.`terminal_states::Set{Int64}`

Stores the terminal states`discount::Float64`

The discount factor

**Constructors**

`SparseTabularMDP(mdp::MDP)`

: One can provide the matrices to the default constructor or one can construct a`SparseTabularMDP`

from any discrete state MDP defined using the explicit interface.

Note that constructing the transition and reward matrices requires to iterate over all the states and can take a while. To learn more information about how to define an MDP with the explicit interface please visit https://juliapomdp.github.io/POMDPs.jl/latest/explicit/ .

`SparseTabularMDP(smdp::SparseTabularMDP; transition, reward, discount)`

: This constructor returns a new sparse MDP that is a copy of the original smdp except for the field specified by the keyword arguments.

`POMDPModelTools.SparseTabularPOMDP`

— Type`SparseTabularPOMDP`

A POMDP object where states and actions are integers and the transition and observation distributions are represented by lists of sparse matrices. This data structure can be useful to exploit in vectorized algorithms to gain performance (e.g. see SparseValueIterationSolver). The recommended way to access the transition, reward, and observation matrices is through the provided accessor functions: `transition_matrix`

, `reward_vector`

, `observation_matrix`

.

**Fields**

`T::Vector{SparseMatrixCSC{Float64, Int64}}`

The transition model is represented as a vector of sparse matrices (one for each action).`T[a][s, sp]`

the probability of transition from`s`

to`sp`

taking action`a`

.`R::Array{Float64, 2}`

The reward is represented as a matrix where the rows are states and the columns actions:`R[s, a]`

is the reward of taking action`a`

in sate`s`

.`O::Vector{SparseMatrixCSC{Float64, Int64}}`

The observation model is represented as a vector of sparse matrices (one for each action).`O[a][sp, o]`

is the probability of observing`o`

from state`sp`

after having taken action`a`

.`terminal_states::Set{Int64}`

Stores the terminal states`discount::Float64`

The discount factor

**Constructors**

`SparseTabularPOMDP(pomdp::POMDP)`

: One can provide the matrices to the default constructor or one can construct a`SparseTabularPOMDP`

from any discrete state MDP defined using the explicit interface.

Note that constructing the transition and reward matrices requires to iterate over all the states and can take a while. To learn more information about how to define an MDP with the explicit interface please visit https://juliapomdp.github.io/POMDPs.jl/latest/explicit/ .

`SparseTabularPOMDP(spomdp::SparseTabularMDP; transition, reward, observation, discount)`

: This constructor returns a new sparse POMDP that is a copy of the original smdp except for the field specified by the keyword arguments.

`POMDPModelTools.transition_matrix`

— Function`transition_matrix(p::SparseTabularProblem, a)`

Accessor function for the transition model of a sparse tabular problem. It returns a sparse matrix containing the transition probabilities when taking action a: T[s, sp] = Pr(sp | s, a).

`POMDPModelTools.reward_vector`

— Function`reward_vector(p::SparseTabularProblem, a)`

Accessor function for the reward function of a sparse tabular problem. It returns a vector containing the reward for all the states when taking action a: R(s, a). The length of the return vector is equal to the number of states.

`POMDPModelTools.observation_matrix`

— Function`observation_matrix(p::SparseTabularPOMDP, a::Int64)`

Accessor function for the observation model of a sparse tabular POMDP. It returns a sparse matrix containing the observation probabilities when having taken action a: O[sp, o] = Pr(o | sp, a).

`POMDPModelTools.reward_matrix`

— Function`reward_matrix(p::SparseTabularProblem)`

Accessor function for the reward matrix R[s, a] of a sparse tabular problem.

`POMDPModelTools.observation_matrices`

— Function`observation_matrices(p::SparseTabularPOMDP)`

Accessor function for the observation model of a sparse tabular POMDP. It returns a list of sparse matrices for each action of the problem.

## Fully Observable POMDP

`POMDPModelTools.FullyObservablePOMDP`

— Type`FullyObservablePOMDP(mdp)`

Turn `MDP`

`mdp`

into a `POMDP`

where the observations are the states of the MDP.

## Generative Belief MDP

Every POMDP is an MDP on the belief space `GenerativeBeliefMDP`

creates a generative model for that MDP.

The reward generated by the `GenerativeBeliefMDP`

is the reward for a *single state sampled from the belief*; it is not the expected reward for that belief transition (though, in expectation, they are equivalent of course). Implementing the model with the expected reward requires a custom implementation because belief updaters do not typically deal with reward.

`POMDPModelTools.GenerativeBeliefMDP`

— Type`GenerativeBeliefMDP(pomdp, updater)`

Create a generative model of the belief MDP corresponding to POMDP `pomdp`

with belief updates performed by `updater`

.

### Example

```
using POMDPModels
using POMDPModelTools
using BeliefUpdaters
pomdp = BabyPOMDP()
updater = DiscreteUpdater(pomdp)
belief_mdp = GenerativeBeliefMDP(pomdp, updater)
@show statetype(belief_mdp) # POMDPModels.BoolDistribution
for (a, r, sp) in stepthrough(belief_mdp, RandomPolicy(belief_mdp), "a,r,sp", max_steps=5)
@show a, r, sp
end
```

## Underlying MDP

`POMDPModelTools.UnderlyingMDP`

— Type`UnderlyingMDP(m::POMDP)`

Transform `POMDP`

`m`

into an `MDP`

where the states are fully observed.

`UnderlyingMDP(m::MDP)`

Return `m`

## State Action Reward Model

`POMDPModelTools.StateActionReward`

— Type`StateActionReward(m::Union{MDP,POMDP})`

Robustly create a reward function that depends only on the state and action.

If `reward(m, s, a)`

is implemented, that will be used, otherwise the mean of `reward(m, s, a, sp)`

for MDPs or `reward(m, s, a, sp, o)`

for POMDPs will be used.

**Example**

```
using POMDPs
using POMDPModels
using POMDPModelTools
m = BabyPOMDP()
rm = StateActionReward(m)
rm(true, true)
# output
-15.0
```