Circular Maze

Description

This environment is a generalization of the Circular Maze POMDP described in Finding Approximate POMDP solutions Through Belief Compression.^[1] The world consists of n_corridor 1D circular corridors that each have corridor_length states. The robot spawns in a random corridor. It must determine which corridor its in, navigate to the proper goal state, and finally declare that it has finished.

Figure from Finding Approximate POMDP solutions Through Belief Compression.

Action Space

Transitions left and right are noisy and non-deterministic. Transition probabilities are from a discrete von Mises distribution with unit concentration and mean at the target state.

Num	Action	Description
1	`CMAZE_LEFT`	Move left with von Mises noise.
2	`CMAZE_RIGHT`	Move right with von Mises noise.
3	`CMAZE_SENSE_CORRIDOR`	Observe the current corridor.
4	`CMAZE_DECLARE_GOAL`	Ends the episode. Receive `r_findgoal` if at the goal.

State Space

The (ordered) state space is an array of all CircularMazeStates and a terminalstate: [CircularMaze(1, 1), ..., CircularMaze(n_corridors, corridor_length), TerminalState()].

Observation Space

The observation space is the union of the state space and 1:n_corridors. If the robot picks CMAZE_SENSE_CORRIDOR, they observe the index of the current corridor. Otherwise, they observe their current state with von Mises noise.

Rewards

The goal is to navigate to the correct goal state for the given corridor and then to declare the goal once arrived. If the robot correctly declares the goal, it receives r_findgoal. It incurs a r_timestep_penalty for every timestep it does not reach the goal. By default r_findgoal is 1 and r_timestep_penalty is 0.

CircularMaze(n_corridors::Integer, corridor_length::Integer, discount::Float64, r_findgoal::Float64, r_timestep_penalty::Float64)
CircularMaze(n_corridors::Integer, corridor_length::Integer; kwargs...)
CircularMaze()

A POMDP representing a circular maze environment.

Fields

n_corridors::Integer: Number of corridors in the circular maze.
corridor_length::Integer: Length of each corridor.
probabilities::AbstractArray: Probability masses for creating von Mises distributions.
center::Integer: The central position in the maze.
discount::Float64: Discount factor for future rewards.
r_findgoal::Float64: Reward for finding the goal.
r_timestep_penalty::Float64: Penalty for each timestep taken.
states::AbstractArray: Array of all possible states in the maze.
goals::AbstractArray: Array of goal states in the maze.

Example

using CompressedBeliefMDPs

n_corridors = 8
corridor_length = 25
maze = CircularMaze(n_corridors, corridor_length)

source

CompressedBeliefMDPs.CircularMazeState — Type

CircularMazeState(corridor::Integer, x::Integer)

The CircularMazeState struct represents the state of an agent in a circular maze.