pomdp_py.problems.load_unload package

Load/Unload

Problem originally introduced in Solving POMDPs by Searching the Space of Finite Policies

Quoting from the original paper on problem description:

The load/unload problem with 8 locations: the agent starts in the “Unload” location (U) and receives a reward each time it returns to this place after passing through the “Load” location (L). The problem is partially observable because the agent cannot distinguish the different locations in between Load and Unload, and because it cannot perceive if it is loaded or not (\(|S| = 14\), \(|O| = 3\) and \(|A| = 2\)).

Figure from the paper:

Load/Unload Problem.

Load/Unload problem

python -m pomdp_py -r load_unload

Submodules

pomdp_py.problems.load_unload.load_unload module

The load unload problem. An agent is placed on a one dimensional grid world and is tasked with loading itself up on the right side of the world and unloading on the left. The agent can observe whether or not it is in the load or unload block but can not tell its exact location of whether it is loaded or unloaded. Therefore the agent must maintain belief about it’s location and load status.

States are defined by the location of the agent and whether or not it is loaded Actions: “move-left”, “move-right” Rewards:

+100 for moving into the unload block while loaded -1 otherwise

class pomdp_py.problems.load_unload.load_unload.LUState(x, loaded)[source]

Bases: State

class pomdp_py.problems.load_unload.load_unload.LUAction(name)[source]

Bases: Action

class pomdp_py.problems.load_unload.load_unload.LUObservation(obs)[source]

Bases: Observation

class pomdp_py.problems.load_unload.load_unload.LUObservationModel[source]

Bases: ObservationModel

This problem is small enough for the probabilities to be directly given externally

probability(self, observation, next_state, action)[source]

Returns the probability of \(\Pr(o|s',a)\).

Parameters:
  • observation (Observation) – the observation \(o\)

  • next_state (State) – the next state \(s'\)

  • action (Action) – the action \(a\)

Returns:

the probability \(\Pr(o|s',a)\)

Return type:

float

sample(self, next_state, action)[source]

Returns observation randomly sampled according to the distribution of this observation model.

Parameters:
  • next_state (State) – the next state \(s'\)

  • action (Action) – the action \(a\)

Returns:

the observation \(o\)

Return type:

Observation

argmax(next_state, action, normalized=False, **kwargs)[source]

Returns the most likely observation

class pomdp_py.problems.load_unload.load_unload.LUTransitionModel[source]

Bases: TransitionModel

This problem is small enough for the probabilities to be directly given externally

probability(self, next_state, state, action)[source]

Returns the probability of \(\Pr(s'|s,a)\).

Parameters:
  • state (State) – the state \(s\)

  • next_state (State) – the next state \(s'\)

  • action (Action) – the action \(a\)

Returns:

the probability \(\Pr(s'|s,a)\)

Return type:

float

sample(self, state, action)[source]

Returns next state randomly sampled according to the distribution of this transition model.

Parameters:
  • state (State) – the next state \(s\)

  • action (Action) – the action \(a\)

Returns:

the next state \(s'\)

Return type:

State

argmax(state, action, normalized=False, **kwargs)[source]

Returns the most likely next state

class pomdp_py.problems.load_unload.load_unload.LURewardModel[source]

Bases: RewardModel

probability(self, reward, state, action, next_state)[source]

Returns the probability of \(\Pr(r|s,a,s')\).

Parameters:
  • reward (float) – the reward \(r\)

  • state (State) – the state \(s\)

  • action (Action) – the action \(a\)

  • next_state (State) – the next state \(s'\)

Returns:

the probability \(\Pr(r|s,a,s')\)

Return type:

float

sample(self, state, action, next_state)[source]

Returns reward randomly sampled according to the distribution of this reward model. This is required, i.e. assumed to be implemented for a reward model.

Parameters:
  • state (State) – the next state \(s\)

  • action (Action) – the action \(a\)

  • next_state (State) – the next state \(s'\)

Returns:

the reward \(r\)

Return type:

float

argmax(state, action, next_state, normalized=False, **kwargs)[source]

Returns the most likely reward

class pomdp_py.problems.load_unload.load_unload.LUPolicyModel[source]

Bases: RandomRollout

This is an extremely dumb policy model; To keep consistent with the framework.

probability(self, action, state)[source]

Returns the probability of \(\pi(a|s)\).

Parameters:
  • action (Action) – the action \(a\)

  • state (State) – the state \(s\)

Returns:

the probability \(\pi(a|s)\)

Return type:

float

sample(self, state)[source]

Returns action randomly sampled according to the distribution of this policy model.

Parameters:

state (State) – the next state \(s\)

Returns:

the action \(a\)

Return type:

Action

argmax(state, normalized=False, **kwargs)[source]

Returns the most likely reward

get_all_actions(self, *args)[source]

Returns a set of all possible actions, if feasible.

class pomdp_py.problems.load_unload.load_unload.LoadUnloadProblem(init_state, init_belief)[source]

Bases: POMDP

pomdp_py.problems.load_unload.load_unload.generate_random_state()[source]
pomdp_py.problems.load_unload.load_unload.generate_init_belief(num_particles)[source]
pomdp_py.problems.load_unload.load_unload.test_planner(load_unload_problem, planner, nsteps=3, discount=0.95)[source]
pomdp_py.problems.load_unload.load_unload.main()[source]

Module contents