problems.load_unload package

Submodules

problems.load_unload.load_unload module

The load unload problem. An agent is placed on a one dimensional grid world and is tasked with loading itself up on the right side of the world and unloading on the left. The agent can observe whether or not it is in the load or unload block but can not tell its exact location of whether it is loaded or unloaded. Therefore the agent must maintain belief about it’s location and load status.

States are defined by the location of the agent and whether or not it is loaded Actions: “move-left”, “move-right” Rewards:

+100 for moving into the unload block while loaded -1 otherwise

class problems.load_unload.load_unload.LUState(x, loaded)[source]

Bases: State

class problems.load_unload.load_unload.LUAction(name)[source]

Bases: Action

class problems.load_unload.load_unload.LUObservation(obs)[source]

Bases: Observation

class problems.load_unload.load_unload.LUObservationModel[source]

Bases: ObservationModel

This problem is small enough for the probabilities to be directly given externally

probability(self, observation, next_state, action)[source]

Returns the probability of \(\Pr(o|s',a)\).

Parameters:
  • observation (Observation) – the observation \(o\)

  • next_state (State) – the next state \(s'\)

  • action (Action) – the action \(a\)

Returns:

the probability \(\Pr(o|s',a)\)

Return type:

float

sample(self, next_state, action)[source]

Returns observation randomly sampled according to the distribution of this observation model.

Parameters:
  • next_state (State) – the next state \(s'\)

  • action (Action) – the action \(a\)

Returns:

the observation \(o\)

Return type:

Observation

argmax(next_state, action, normalized=False, **kwargs)[source]

Returns the most likely observation

class problems.load_unload.load_unload.LUTransitionModel[source]

Bases: TransitionModel

This problem is small enough for the probabilities to be directly given externally

probability(self, next_state, state, action)[source]

Returns the probability of \(\Pr(s'|s,a)\).

Parameters:
  • state (State) – the state \(s\)

  • next_state (State) – the next state \(s'\)

  • action (Action) – the action \(a\)

Returns:

the probability \(\Pr(s'|s,a)\)

Return type:

float

sample(self, state, action)[source]

Returns next state randomly sampled according to the distribution of this transition model.

Parameters:
  • state (State) – the next state \(s\)

  • action (Action) – the action \(a\)

Returns:

the next state \(s'\)

Return type:

State

argmax(state, action, normalized=False, **kwargs)[source]

Returns the most likely next state

class problems.load_unload.load_unload.LURewardModel[source]

Bases: RewardModel

probability(self, reward, state, action, next_state)[source]

Returns the probability of \(\Pr(r|s,a,s')\).

Parameters:
  • reward (float) – the reward \(r\)

  • state (State) – the state \(s\)

  • action (Action) – the action \(a\)

  • next_state (State) – the next state \(s'\)

Returns:

the probability \(\Pr(r|s,a,s')\)

Return type:

float

sample(self, state, action, next_state)[source]

Returns reward randomly sampled according to the distribution of this reward model. This is required, i.e. assumed to be implemented for a reward model.

Parameters:
  • state (State) – the next state \(s\)

  • action (Action) – the action \(a\)

  • next_state (State) – the next state \(s'\)

Returns:

the reward \(r\)

Return type:

float

argmax(state, action, next_state, normalized=False, **kwargs)[source]

Returns the most likely reward

class problems.load_unload.load_unload.LUPolicyModel[source]

Bases: RandomRollout

This is an extremely dumb policy model; To keep consistent with the framework.

probability(self, action, state)[source]

Returns the probability of \(\pi(a|s)\).

Parameters:
  • action (Action) – the action \(a\)

  • state (State) – the state \(s\)

Returns:

the probability \(\pi(a|s)\)

Return type:

float

sample(self, state)[source]

Returns action randomly sampled according to the distribution of this policy model.

Parameters:

state (State) – the next state \(s\)

Returns:

the action \(a\)

Return type:

Action

argmax(state, normalized=False, **kwargs)[source]

Returns the most likely reward

get_all_actions(self, *args)[source]

Returns a set of all possible actions, if feasible.

class problems.load_unload.load_unload.LoadUnloadProblem(init_state, init_belief)[source]

Bases: POMDP

problems.load_unload.load_unload.generate_random_state()[source]
problems.load_unload.load_unload.generate_init_belief(num_particles)[source]
problems.load_unload.load_unload.test_planner(load_unload_problem, planner, nsteps=3, discount=0.95)[source]
problems.load_unload.load_unload.main()[source]

Module contents