pomdp_problems.rocksample package

pomdp_problems.rocksample.rocksample_problem module

RockSample(n,k) problem

Origin: Heuristic Search Value Iteration for POMDPs (UAI 2004)

Description:

State space:

Position {(1,1),(1,2),…(n,n)} \(\times\) RockType_1 \(\times\) RockType_2, …, \(\times\) RockType_k where RockType_i = {Good, Bad} \(\times\) TerminalState

(basically, the positions of rocks are known to the robot,

but not represented explicitly in the state space. Check_i will smartly check the rock i at its location.)

Action space:

North, South, East, West, Sample, Check_1, …, Check_k The first four moves the agent deterministically Sample: samples the rock at agent’s current location Check_i: receives a noisy observation about RockType_i (noise determined by eta (\(\eta\)). eta=1 -> perfect sensor; eta=0 -> uniform)

Observation: observes the property of rock i when taking Check_i.

Reward: +10 for Sample a good rock. -10 for Sampling a bad rock.

Move to exit area +10. Other actions have no cost or reward.

Initial belief: every rock has equal probability of being Good or Bad.

pomdp_problems.rocksample.rocksample_problem.euclidean_dist(p1, p2)[source]
class pomdp_problems.rocksample.rocksample_problem.RockType[source]

Bases: object

GOOD = 'good'
BAD = 'bad'
static invert(rocktype)[source]
static random(p=0.5)[source]
class pomdp_problems.rocksample.rocksample_problem.State(position, rocktypes, terminal=False)[source]

Bases: pomdp_py.framework.basics.State

class pomdp_problems.rocksample.rocksample_problem.Action(name)[source]

Bases: pomdp_py.framework.basics.Action

class pomdp_problems.rocksample.rocksample_problem.MoveAction(motion)[source]

Bases: pomdp_problems.rocksample.rocksample_problem.Action

EAST = (1, 0)
WEST = (-1, 0)
NORTH = (0, 1)
SOUTH = (0, -1)
class pomdp_problems.rocksample.rocksample_problem.SampleAction[source]

Bases: pomdp_problems.rocksample.rocksample_problem.Action

class pomdp_problems.rocksample.rocksample_problem.CheckAction(rock_id)[source]

Bases: pomdp_problems.rocksample.rocksample_problem.Action

class pomdp_problems.rocksample.rocksample_problem.Observation(quality)[source]

Bases: pomdp_py.framework.basics.Observation

class pomdp_problems.rocksample.rocksample_problem.RSTransitionModel(n, rock_locs, in_exit_area)[source]

Bases: pomdp_py.framework.basics.TransitionModel

The model is deterministic

probability(self, next_state, state, action, **kwargs)[source]

Returns the probability of \(\Pr(s'|s,a)\).

Parameters
  • state (State) – the state \(s\)

  • next_state (State) – the next state \(s'\)

  • action (Action) – the action \(a\)

Returns

the probability \(\Pr(s'|s,a)\)

Return type

float

sample(self, state, action, **kwargs)[source]

Returns next state randomly sampled according to the distribution of this transition model.

Parameters
  • state (State) – the next state \(s\)

  • action (Action) – the action \(a\)

Returns

the next state \(s'\)

Return type

State

argmax(state, action)[source]

Returns the most likely next state

class pomdp_problems.rocksample.rocksample_problem.RSObservationModel(rock_locs, half_efficiency_dist=20)[source]

Bases: pomdp_py.framework.basics.ObservationModel

probability(self, observation, next_state, action, **kwargs)[source]

Returns the probability of \(\Pr(o|s',a)\).

Parameters
  • observation (Observation) – the observation \(o\)

  • next_state (State) – the next state \(s'\)

  • action (Action) – the action \(a\)

Returns

the probability \(\Pr(o|s',a)\)

Return type

float

sample(self, next_state, action, **kwargs)[source]

Returns observation randomly sampled according to the distribution of this observation model.

Parameters
  • next_state (State) – the next state \(s'\)

  • action (Action) – the action \(a\)

Returns

the observation \(o\)

Return type

Observation

argmax(next_state, action)[source]

Returns the most likely observation

class pomdp_problems.rocksample.rocksample_problem.RSRewardModel(rock_locs, in_exit_area)[source]

Bases: pomdp_py.framework.basics.RewardModel

sample(self, state, action, next_state, **kwargs)[source]

Returns reward randomly sampled according to the distribution of this reward model.

Parameters
  • state (State) – the next state \(s\)

  • action (Action) – the action \(a\)

  • next_state (State) – the next state \(s'\)

Returns

the reward \(r\)

Return type

float

argmax(self, state, action, next_state, **kwargs)[source]

Returns the most likely reward

probability(self, reward, state, action, next_state, **kwargs)[source]

Returns the probability of \(\Pr(r|s,a,s')\).

Parameters
  • reward (float) – the reward \(r\)

  • state (State) – the state \(s\)

  • action (Action) – the action \(a\)

  • next_state (State) – the next state \(s'\)

Returns

the probability \(\Pr(r|s,a,s')\)

Return type

float

class pomdp_problems.rocksample.rocksample_problem.RSPolicyModel(k)[source]

Bases: pomdp_py.algorithms.po_uct.RolloutPolicy

Simple policy model according to problem description.

sample(self, state, **kwargs)[source]

Returns action randomly sampled according to the distribution of this policy model.

Parameters

state (State) – the next state \(s\)

Returns

the action \(a\)

Return type

Action

probability(self, action, state, **kwargs)[source]

Returns the probability of \(\pi(a|s)\).

Parameters
  • action (Action) – the action \(a\)

  • state (State) – the state \(s\)

Returns

the probability \(\pi(a|s)\)

Return type

float

argmax(state, normalized=False, **kwargs)[source]

Returns the most likely reward

get_all_actions(self, *args, **kwargs)[source]

Returns a set of all possible actions, if feasible.

rollout(self, State state, tuple history=None)[source]
class pomdp_problems.rocksample.rocksample_problem.RockSampleProblem(n, k, init_state, rock_locs, init_belief)[source]

Bases: pomdp_py.framework.basics.POMDP

static random_free_location(n, not_free_locs)[source]

returns a random (x,y) location in nxn grid that is free.

in_exit_area(pos)[source]
static generate_instance(n, k)[source]

Returns init_state and rock locations for an instance of RockSample(n,k)

print_state()[source]
pomdp_problems.rocksample.rocksample_problem.test_planner(rocksample, planner, nsteps=3, discount=0.95)[source]
pomdp_problems.rocksample.rocksample_problem.init_particles_belief(num_particles, init_state, belief='uniform')[source]