problems.load_unload.load_unload module
The load unload problem. An agent is placed on a one dimensional grid world
and is tasked with loading itself up on the right side of the world and
unloading on the left. The agent can observe whether or not it is in the load or
unload block but can not tell its exact location of whether it is loaded or
unloaded. Therefore the agent must maintain belief about it’s location and load
status.
States are defined by the location of the agent and whether or not it is loaded
Actions: “move-left”, “move-right”
Rewards:
+100 for moving into the unload block while loaded
-1 otherwise
-
class problems.load_unload.load_unload.LUState(x, loaded)[source]
Bases: State
-
class problems.load_unload.load_unload.LUAction(name)[source]
Bases: Action
-
class problems.load_unload.load_unload.LUObservation(obs)[source]
Bases: Observation
-
class problems.load_unload.load_unload.LUObservationModel[source]
Bases: ObservationModel
This problem is small enough for the probabilities to be directly given
externally
-
probability(self, observation, next_state, action)[source]
Returns the probability of \(\Pr(o|s',a)\).
- Parameters:
observation (Observation) – the observation \(o\)
next_state (State) – the next state \(s'\)
action (Action) – the action \(a\)
- Returns:
the probability \(\Pr(o|s',a)\)
- Return type:
float
-
sample(self, next_state, action)[source]
Returns observation randomly sampled according to the
distribution of this observation model.
- Parameters:
-
- Returns:
the observation \(o\)
- Return type:
Observation
-
argmax(next_state, action, normalized=False, **kwargs)[source]
Returns the most likely observation
-
class problems.load_unload.load_unload.LUTransitionModel[source]
Bases: TransitionModel
This problem is small enough for the probabilities to be directly given
externally
-
probability(self, next_state, state, action)[source]
Returns the probability of \(\Pr(s'|s,a)\).
- Parameters:
state (State) – the state \(s\)
next_state (State) – the next state \(s'\)
action (Action) – the action \(a\)
- Returns:
the probability \(\Pr(s'|s,a)\)
- Return type:
float
-
sample(self, state, action)[source]
Returns next state randomly sampled according to the
distribution of this transition model.
- Parameters:
-
- Returns:
the next state \(s'\)
- Return type:
State
-
argmax(state, action, normalized=False, **kwargs)[source]
Returns the most likely next state
-
class problems.load_unload.load_unload.LURewardModel[source]
Bases: RewardModel
-
probability(self, reward, state, action, next_state)[source]
Returns the probability of \(\Pr(r|s,a,s')\).
- Parameters:
reward (float) – the reward \(r\)
state (State) – the state \(s\)
action (Action) – the action \(a\)
next_state (State) – the next state \(s'\)
- Returns:
the probability \(\Pr(r|s,a,s')\)
- Return type:
float
-
sample(self, state, action, next_state)[source]
Returns reward randomly sampled according to the
distribution of this reward model. This is required,
i.e. assumed to be implemented for a reward model.
- Parameters:
state (State) – the next state \(s\)
action (Action) – the action \(a\)
next_state (State) – the next state \(s'\)
- Returns:
the reward \(r\)
- Return type:
float
-
argmax(state, action, next_state, normalized=False, **kwargs)[source]
Returns the most likely reward
-
class problems.load_unload.load_unload.LUPolicyModel[source]
Bases: RandomRollout
This is an extremely dumb policy model; To keep consistent
with the framework.
-
probability(self, action, state)[source]
Returns the probability of \(\pi(a|s)\).
- Parameters:
-
- Returns:
the probability \(\pi(a|s)\)
- Return type:
float
-
sample(self, state)[source]
Returns action randomly sampled according to the
distribution of this policy model.
- Parameters:
state (State) – the next state \(s\)
- Returns:
the action \(a\)
- Return type:
Action
-
argmax(state, normalized=False, **kwargs)[source]
Returns the most likely reward
-
get_all_actions(self, *args)[source]
Returns a set of all possible actions, if feasible.
-
class problems.load_unload.load_unload.LoadUnloadProblem(init_state, init_belief)[source]
Bases: POMDP
-
problems.load_unload.load_unload.generate_random_state()[source]
-
problems.load_unload.load_unload.generate_init_belief(num_particles)[source]
-
problems.load_unload.load_unload.test_planner(load_unload_problem, planner, nsteps=3, discount=0.95)[source]
-
problems.load_unload.load_unload.main()[source]