pomdp_py.problems.tag.models package

Subpackages

Submodules

pomdp_py.problems.tag.models.observation_model module

class pomdp_py.problems.tag.models.observation_model.TagObservationModel[source]

Bases: ObservationModel

In this observation model, the robot deterministically observes the target location when it is in the same grid cell as the target. Ohterwise the robot does not observe anything.

probability(self, observation, next_state, action)[source]

Returns the probability of \(\Pr(o|s',a)\).

Parameters:
  • observation (Observation) – the observation \(o\)

  • next_state (State) – the next state \(s'\)

  • action (Action) – the action \(a\)

Returns:

the probability \(\Pr(o|s',a)\)

Return type:

float

sample(next_state, action)[source]

There is no stochaisticity in the observation model

argmax(self, next_state, action)[source]

Returns the most likely observation

pomdp_py.problems.tag.models.policy_model module

class pomdp_py.problems.tag.models.policy_model.TagPolicyModel(grid_map=None)[source]

Bases: RolloutPolicy

sample(self, state)[source]

Returns action randomly sampled according to the distribution of this policy model.

Parameters:

state (State) – the next state \(s\)

Returns:

the action \(a\)

Return type:

Action

get_all_actions(self, *args)[source]

Returns a set of all possible actions, if feasible.

rollout(self, State state, tuple history=None)[source]

pomdp_py.problems.tag.models.reward_model module

class pomdp_py.problems.tag.models.reward_model.TagRewardModel(small=1, big=10)[source]

Bases: RewardModel

probability(self, reward, state, action, next_state)[source]

Returns the probability of \(\Pr(r|s,a,s')\).

Parameters:
  • reward (float) – the reward \(r\)

  • state (State) – the state \(s\)

  • action (Action) – the action \(a\)

  • next_state (State) – the next state \(s'\)

Returns:

the probability \(\Pr(r|s,a,s')\)

Return type:

float

sample(self, state, action, next_state)[source]

Returns reward randomly sampled according to the distribution of this reward model. This is required, i.e. assumed to be implemented for a reward model.

Parameters:
  • state (State) – the next state \(s\)

  • action (Action) – the action \(a\)

  • next_state (State) – the next state \(s'\)

Returns:

the reward \(r\)

Return type:

float

pomdp_py.problems.tag.models.transition_model module

The Tag problem. Implemented according to the paper Anytime Point-Based Approximations for Large POMDPs.

Transition model: the robot moves deterministically. The target’s movement

depends on the robot; With Pr=0.8 the target moves away from the robot, and with Pr=0.2, the target stays at the same place. The target never moves closer to the robot.

class pomdp_py.problems.tag.models.transition_model.TagTransitionModel(grid_map, target_motion_policy)[source]

Bases: TransitionModel

classmethod if_move_by(grid_map, position, action)[source]
probability(self, next_state, state, action)[source]

Returns the probability of \(\Pr(s'|s,a)\).

Parameters:
  • state (State) – the state \(s\)

  • next_state (State) – the next state \(s'\)

  • action (Action) – the action \(a\)

Returns:

the probability \(\Pr(s'|s,a)\)

Return type:

float

sample(self, state, action)[source]

Returns next state randomly sampled according to the distribution of this transition model.

Parameters:
  • state (State) – the next state \(s\)

  • action (Action) – the action \(a\)

Returns:

the next state \(s'\)

Return type:

State

argmax(self, state, action)[source]

Returns the most likely next state

Module contents