pomdp_py.problems.rocksample.cythonize package¶

Submodules¶

pomdp_py.problems.rocksample.cythonize.rocksample_problem module¶

RockSample(n,k) problem

Origin: Heuristic Search Value Iteration for POMDPs (UAI 2004)

Description:

State space:

Position {(1,1),(1,2),…(n,n)} \(\times\) RockType_1 \(\times\) RockType_2, …, \(\times\) RockType_k where RockType_i = {Good, Bad} \(\times\) TerminalState

(basically, the positions of rocks are known to the robot,
but not represented explicitly in the state space. Check_i will smartly check the rock i at its location.)

Action space:

North, South, East, West, Sample, Check_1, …, Check_k The first four moves the agent deterministically Sample: samples the rock at agent’s current location Check_i: receives a noisy observation about RockType_i (noise determined by eta (\(\eta\)). eta=1 -> perfect sensor; eta=0 -> uniform)

Observation: observes the property of rock i when taking Check_i.

Reward: +10 for Sample a good rock. -10 for Sampling a bad rock.: Move to exit area +10. Other actions have no cost or reward.

Initial belief: every rock has equal probability of being Good or Bad.

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.CheckAction¶

Bases: RSAction

rock_id¶

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.MoveAction¶

Bases: RSAction

EAST = (1, 0)¶

NORTH = (0, 1)¶

SOUTH = (0, -1)¶

WEST = (-1, 0)¶

motion¶

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSAction¶: Bases: Action

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSObservation¶

Bases: Observation

quality¶

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSObservationModel¶

Bases: ObservationModel

argmax(next_state, action)¶: Returns the most likely observation

probability(self, observation, next_state, action)¶

Returns the probability of \(\Pr(o|s',a)\).

Parameters:

observation (Observation) – the observation \(o\)
next_state (State) – the next state \(s'\)
action (Action) – the action \(a\)

Returns:

the probability \(\Pr(o|s',a)\)

Return type:

float

sample(self, next_state, action)¶

Returns observation randomly sampled according to the distribution of this observation model.

Parameters:

next_state (State) – the next state \(s'\)
action (Action) – the action \(a\)

Returns:

the observation \(o\)

Return type:

Observation

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSPolicyModel¶

Bases: RolloutPolicy

Simple policy model according to problem description.

argmax(state, normalized=False, **kwargs)¶: Returns the most likely reward

get_all_actions(self, *args)¶: Returns a set of all possible actions, if feasible.

probability(self, action, state)¶

Returns the probability of \(\pi(a|s)\).

Parameters:

action (Action) – the action \(a\)
state (State) – the state \(s\)

Returns:

the probability \(\pi(a|s)\)

Return type:

float

rollout(self, State state, tuple history=None)¶

sample(self, state)¶

Returns action randomly sampled according to the distribution of this policy model.

Parameters:: state (State) – the next state \(s\)
Returns:: the action \(a\)
Return type:: Action

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSRewardModel¶

Bases: RewardModel

argmax(self, state, action, next_state)¶: Returns the most likely reward. This is optional.

probability(self, reward, state, action, next_state)¶

Returns the probability of \(\Pr(r|s,a,s')\).

Parameters:

reward (float) – the reward \(r\)
state (State) – the state \(s\)
action (Action) – the action \(a\)
next_state (State) – the next state \(s'\)

Returns:

the probability \(\Pr(r|s,a,s')\)

Return type:

float

sample(self, state, action, next_state)¶

Returns reward randomly sampled according to the distribution of this reward model. This is required, i.e. assumed to be implemented for a reward model.

Parameters:

state (State) – the next state \(s\)
action (Action) – the action \(a\)
next_state (State) – the next state \(s'\)

Returns:

the reward \(r\)

Return type:

float

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSState¶

Bases: State

position¶

rocktypes¶

terminal¶

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSTransitionModel¶

Bases: TransitionModel

The model is deterministic

argmax(state, action)¶: Returns the most likely next state

probability(self, next_state, state, action)¶

Returns the probability of \(\Pr(s'|s,a)\).

Parameters:

state (State) – the state \(s\)
next_state (State) – the next state \(s'\)
action (Action) – the action \(a\)

Returns:

the probability \(\Pr(s'|s,a)\)

Return type:

float

sample(self, state, action)¶

Returns next state randomly sampled according to the distribution of this transition model.

Parameters:

state (State) – the next state \(s\)
action (Action) – the action \(a\)

Returns:

the next state \(s'\)

Return type:

State

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RockSampleProblem¶

Bases: POMDP

static generate_instance(n, k)¶: Returns init_state and rock locations for an instance of RockSample(n,k)

in_exit_area(pos)¶

print_state()¶

static random_free_location(n, not_free_locs)¶: returns a random (x,y) location in nxn grid that is free.

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RockType¶

Bases: object

BAD = 'bad'¶

GOOD = 'good'¶

static invert(rocktype)¶

static random(p=0.5)¶

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.SampleAction¶: Bases: RSAction

pomdp_py.problems.rocksample.cythonize.rocksample_problem.euclidean_dist(p1, p2)¶

pomdp_py.problems.rocksample.cythonize.rocksample_problem.init_particles_belief(k, num_particles, init_state, belief='uniform')¶

pomdp_py.problems.rocksample.cythonize.rocksample_problem.main()¶

pomdp_py.problems.rocksample.cythonize.rocksample_problem.test_planner(rocksample, planner, nsteps=3, discount=0.95)¶

pomdp_py.problems.rocksample.cythonize.rocksample_problem module¶

RockSample(n,k) problem

Origin: Heuristic Search Value Iteration for POMDPs (UAI 2004)

Description:

State space:

Position {(1,1),(1,2),…(n,n)} \(\times\) RockType_1 \(\times\) RockType_2, …, \(\times\) RockType_k where RockType_i = {Good, Bad} \(\times\) TerminalState

(basically, the positions of rocks are known to the robot,
but not represented explicitly in the state space. Check_i will smartly check the rock i at its location.)

Action space:

North, South, East, West, Sample, Check_1, …, Check_k The first four moves the agent deterministically Sample: samples the rock at agent’s current location Check_i: receives a noisy observation about RockType_i (noise determined by eta (\(\eta\)). eta=1 -> perfect sensor; eta=0 -> uniform)

Observation: observes the property of rock i when taking Check_i.

Reward: +10 for Sample a good rock. -10 for Sampling a bad rock.: Move to exit area +10. Other actions have no cost or reward.

Initial belief: every rock has equal probability of being Good or Bad.

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.CheckAction¶

Bases: RSAction

rock_id¶

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.MoveAction¶

Bases: RSAction

EAST = (1, 0)¶

NORTH = (0, 1)¶

SOUTH = (0, -1)¶

WEST = (-1, 0)¶

motion¶

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSAction¶: Bases: Action

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSObservation¶

Bases: Observation

quality¶

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSObservationModel¶

Bases: ObservationModel

argmax(next_state, action)¶: Returns the most likely observation

probability(self, observation, next_state, action)¶

Returns the probability of \(\Pr(o|s',a)\).

Parameters:

observation (Observation) – the observation \(o\)
next_state (State) – the next state \(s'\)
action (Action) – the action \(a\)

Returns:

the probability \(\Pr(o|s',a)\)

Return type:

float

sample(self, next_state, action)¶

Returns observation randomly sampled according to the distribution of this observation model.

Parameters:

next_state (State) – the next state \(s'\)
action (Action) – the action \(a\)

Returns:

the observation \(o\)

Return type:

Observation

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSPolicyModel¶

Bases: RolloutPolicy

Simple policy model according to problem description.

argmax(state, normalized=False, **kwargs)¶: Returns the most likely reward

get_all_actions(self, *args)¶: Returns a set of all possible actions, if feasible.

probability(self, action, state)¶

Returns the probability of \(\pi(a|s)\).

Parameters:

action (Action) – the action \(a\)
state (State) – the state \(s\)

Returns:

the probability \(\pi(a|s)\)

Return type:

float

rollout(self, State state, tuple history=None)¶

sample(self, state)¶

Returns action randomly sampled according to the distribution of this policy model.

Parameters:: state (State) – the next state \(s\)
Returns:: the action \(a\)
Return type:: Action

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSRewardModel¶

Bases: RewardModel

argmax(self, state, action, next_state)¶: Returns the most likely reward. This is optional.

probability(self, reward, state, action, next_state)¶

Returns the probability of \(\Pr(r|s,a,s')\).

Parameters:

reward (float) – the reward \(r\)
state (State) – the state \(s\)
action (Action) – the action \(a\)
next_state (State) – the next state \(s'\)

Returns:

the probability \(\Pr(r|s,a,s')\)

Return type:

float

sample(self, state, action, next_state)¶

Returns reward randomly sampled according to the distribution of this reward model. This is required, i.e. assumed to be implemented for a reward model.

Parameters:

state (State) – the next state \(s\)
action (Action) – the action \(a\)
next_state (State) – the next state \(s'\)

Returns:

the reward \(r\)

Return type:

float

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSState¶

Bases: State

position¶

rocktypes¶

terminal¶

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSTransitionModel¶

Bases: TransitionModel

The model is deterministic

argmax(state, action)¶: Returns the most likely next state

probability(self, next_state, state, action)¶

Returns the probability of \(\Pr(s'|s,a)\).

Parameters:

state (State) – the state \(s\)
next_state (State) – the next state \(s'\)
action (Action) – the action \(a\)

Returns:

the probability \(\Pr(s'|s,a)\)

Return type:

float

sample(self, state, action)¶

Returns next state randomly sampled according to the distribution of this transition model.

Parameters:

state (State) – the next state \(s\)
action (Action) – the action \(a\)

Returns:

the next state \(s'\)

Return type:

State

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RockSampleProblem¶

Bases: POMDP

static generate_instance(n, k)¶: Returns init_state and rock locations for an instance of RockSample(n,k)

in_exit_area(pos)¶

print_state()¶

static random_free_location(n, not_free_locs)¶: returns a random (x,y) location in nxn grid that is free.

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RockType¶

Bases: object

BAD = 'bad'¶

GOOD = 'good'¶

static invert(rocktype)¶

static random(p=0.5)¶

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.SampleAction¶: Bases: RSAction

pomdp_py.problems.rocksample.cythonize.rocksample_problem.euclidean_dist(p1, p2)¶

pomdp_py.problems.rocksample.cythonize.rocksample_problem.init_particles_belief(k, num_particles, init_state, belief='uniform')¶

pomdp_py.problems.rocksample.cythonize.rocksample_problem.main()¶

pomdp_py.problems.rocksample.cythonize.rocksample_problem.test_planner(rocksample, planner, nsteps=3, discount=0.95)¶

pomdp_py.problems.rocksample.cythonize.rocksample_problem module¶

RockSample(n,k) problem

Origin: Heuristic Search Value Iteration for POMDPs (UAI 2004)

Description:

State space:

Position {(1,1),(1,2),…(n,n)} \(\times\) RockType_1 \(\times\) RockType_2, …, \(\times\) RockType_k where RockType_i = {Good, Bad} \(\times\) TerminalState

(basically, the positions of rocks are known to the robot,
but not represented explicitly in the state space. Check_i will smartly check the rock i at its location.)

Action space:

North, South, East, West, Sample, Check_1, …, Check_k The first four moves the agent deterministically Sample: samples the rock at agent’s current location Check_i: receives a noisy observation about RockType_i (noise determined by eta (\(\eta\)). eta=1 -> perfect sensor; eta=0 -> uniform)

Observation: observes the property of rock i when taking Check_i.

Reward: +10 for Sample a good rock. -10 for Sampling a bad rock.: Move to exit area +10. Other actions have no cost or reward.

Initial belief: every rock has equal probability of being Good or Bad.

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.CheckAction¶

Bases: RSAction

rock_id¶

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.MoveAction¶

Bases: RSAction

EAST = (1, 0)¶

NORTH = (0, 1)¶

SOUTH = (0, -1)¶

WEST = (-1, 0)¶

motion¶

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSAction¶: Bases: Action

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSObservation¶

Bases: Observation

quality¶

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSObservationModel¶

Bases: ObservationModel

argmax(next_state, action)¶: Returns the most likely observation

probability(self, observation, next_state, action)¶

Returns the probability of \(\Pr(o|s',a)\).

Parameters:

observation (Observation) – the observation \(o\)
next_state (State) – the next state \(s'\)
action (Action) – the action \(a\)

Returns:

the probability \(\Pr(o|s',a)\)

Return type:

float

sample(self, next_state, action)¶

Returns observation randomly sampled according to the distribution of this observation model.

Parameters:

next_state (State) – the next state \(s'\)
action (Action) – the action \(a\)

Returns:

the observation \(o\)

Return type:

Observation

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSPolicyModel¶

Bases: RolloutPolicy

Simple policy model according to problem description.

argmax(state, normalized=False, **kwargs)¶: Returns the most likely reward

get_all_actions(self, *args)¶: Returns a set of all possible actions, if feasible.

probability(self, action, state)¶

Returns the probability of \(\pi(a|s)\).

Parameters:

action (Action) – the action \(a\)
state (State) – the state \(s\)

Returns:

the probability \(\pi(a|s)\)

Return type:

float

rollout(self, State state, tuple history=None)¶

sample(self, state)¶

Returns action randomly sampled according to the distribution of this policy model.

Parameters:: state (State) – the next state \(s\)
Returns:: the action \(a\)
Return type:: Action

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSRewardModel¶

Bases: RewardModel

argmax(self, state, action, next_state)¶: Returns the most likely reward. This is optional.

probability(self, reward, state, action, next_state)¶

Returns the probability of \(\Pr(r|s,a,s')\).

Parameters:

reward (float) – the reward \(r\)
state (State) – the state \(s\)
action (Action) – the action \(a\)
next_state (State) – the next state \(s'\)

Returns:

the probability \(\Pr(r|s,a,s')\)

Return type:

float

sample(self, state, action, next_state)¶

Returns reward randomly sampled according to the distribution of this reward model. This is required, i.e. assumed to be implemented for a reward model.

Parameters:

state (State) – the next state \(s\)
action (Action) – the action \(a\)
next_state (State) – the next state \(s'\)

Returns:

the reward \(r\)

Return type:

float

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSState¶

Bases: State

position¶

rocktypes¶

terminal¶

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSTransitionModel¶

Bases: TransitionModel

The model is deterministic

argmax(state, action)¶: Returns the most likely next state

probability(self, next_state, state, action)¶

Returns the probability of \(\Pr(s'|s,a)\).

Parameters:

state (State) – the state \(s\)
next_state (State) – the next state \(s'\)
action (Action) – the action \(a\)

Returns:

the probability \(\Pr(s'|s,a)\)

Return type:

float

sample(self, state, action)¶

Returns next state randomly sampled according to the distribution of this transition model.

Parameters:

state (State) – the next state \(s\)
action (Action) – the action \(a\)

Returns:

the next state \(s'\)

Return type:

State

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RockSampleProblem¶

Bases: POMDP

static generate_instance(n, k)¶: Returns init_state and rock locations for an instance of RockSample(n,k)

in_exit_area(pos)¶

print_state()¶

static random_free_location(n, not_free_locs)¶: returns a random (x,y) location in nxn grid that is free.

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RockType¶

Bases: object

BAD = 'bad'¶

GOOD = 'good'¶

static invert(rocktype)¶

static random(p=0.5)¶

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.SampleAction¶: Bases: RSAction

pomdp_py.problems.rocksample.cythonize.rocksample_problem.euclidean_dist(p1, p2)¶

pomdp_py.problems.rocksample.cythonize.rocksample_problem.init_particles_belief(k, num_particles, init_state, belief='uniform')¶

pomdp_py.problems.rocksample.cythonize.rocksample_problem.main()¶

pomdp_py.problems.rocksample.cythonize.rocksample_problem.test_planner(rocksample, planner, nsteps=3, discount=0.95)¶

pomdp_py.problems.rocksample.cythonize.rocksample_problem module¶

RockSample(n,k) problem

Origin: Heuristic Search Value Iteration for POMDPs (UAI 2004)

Description:

State space:

Position {(1,1),(1,2),…(n,n)} \(\times\) RockType_1 \(\times\) RockType_2, …, \(\times\) RockType_k where RockType_i = {Good, Bad} \(\times\) TerminalState

(basically, the positions of rocks are known to the robot,
but not represented explicitly in the state space. Check_i will smartly check the rock i at its location.)

Action space:

North, South, East, West, Sample, Check_1, …, Check_k The first four moves the agent deterministically Sample: samples the rock at agent’s current location Check_i: receives a noisy observation about RockType_i (noise determined by eta (\(\eta\)). eta=1 -> perfect sensor; eta=0 -> uniform)

Observation: observes the property of rock i when taking Check_i.

Reward: +10 for Sample a good rock. -10 for Sampling a bad rock.: Move to exit area +10. Other actions have no cost or reward.

Initial belief: every rock has equal probability of being Good or Bad.

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.CheckAction¶

Bases: RSAction

rock_id¶

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.MoveAction¶

Bases: RSAction

EAST = (1, 0)¶

NORTH = (0, 1)¶

SOUTH = (0, -1)¶

WEST = (-1, 0)¶

motion¶

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSAction¶: Bases: Action

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSObservation¶

Bases: Observation

quality¶

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSObservationModel¶

Bases: ObservationModel

argmax(next_state, action)¶: Returns the most likely observation

probability(self, observation, next_state, action)¶

Returns the probability of \(\Pr(o|s',a)\).

Parameters:

observation (Observation) – the observation \(o\)
next_state (State) – the next state \(s'\)
action (Action) – the action \(a\)

Returns:

the probability \(\Pr(o|s',a)\)

Return type:

float

sample(self, next_state, action)¶

Returns observation randomly sampled according to the distribution of this observation model.

Parameters:

next_state (State) – the next state \(s'\)
action (Action) – the action \(a\)

Returns:

the observation \(o\)

Return type:

Observation

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSPolicyModel¶

Bases: RolloutPolicy

Simple policy model according to problem description.

argmax(state, normalized=False, **kwargs)¶: Returns the most likely reward

get_all_actions(self, *args)¶: Returns a set of all possible actions, if feasible.

probability(self, action, state)¶

Returns the probability of \(\pi(a|s)\).

Parameters:

action (Action) – the action \(a\)
state (State) – the state \(s\)

Returns:

the probability \(\pi(a|s)\)

Return type:

float

rollout(self, State state, tuple history=None)¶

sample(self, state)¶

Returns action randomly sampled according to the distribution of this policy model.

Parameters:: state (State) – the next state \(s\)
Returns:: the action \(a\)
Return type:: Action

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSRewardModel¶

Bases: RewardModel

argmax(self, state, action, next_state)¶: Returns the most likely reward. This is optional.

probability(self, reward, state, action, next_state)¶

Returns the probability of \(\Pr(r|s,a,s')\).

Parameters:

reward (float) – the reward \(r\)
state (State) – the state \(s\)
action (Action) – the action \(a\)
next_state (State) – the next state \(s'\)

Returns:

the probability \(\Pr(r|s,a,s')\)

Return type:

float

sample(self, state, action, next_state)¶

Returns reward randomly sampled according to the distribution of this reward model. This is required, i.e. assumed to be implemented for a reward model.

Parameters:

state (State) – the next state \(s\)
action (Action) – the action \(a\)
next_state (State) – the next state \(s'\)

Returns:

the reward \(r\)

Return type:

float

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSState¶

Bases: State

position¶

rocktypes¶

terminal¶

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RSTransitionModel¶

Bases: TransitionModel

The model is deterministic

argmax(state, action)¶: Returns the most likely next state

probability(self, next_state, state, action)¶

Returns the probability of \(\Pr(s'|s,a)\).

Parameters:

state (State) – the state \(s\)
next_state (State) – the next state \(s'\)
action (Action) – the action \(a\)

Returns:

the probability \(\Pr(s'|s,a)\)

Return type:

float

sample(self, state, action)¶

Returns next state randomly sampled according to the distribution of this transition model.

Parameters:

state (State) – the next state \(s\)
action (Action) – the action \(a\)

Returns:

the next state \(s'\)

Return type:

State

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RockSampleProblem¶

Bases: POMDP

static generate_instance(n, k)¶: Returns init_state and rock locations for an instance of RockSample(n,k)

in_exit_area(pos)¶

print_state()¶

static random_free_location(n, not_free_locs)¶: returns a random (x,y) location in nxn grid that is free.

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.RockType¶

Bases: object

BAD = 'bad'¶

GOOD = 'good'¶

static invert(rocktype)¶

static random(p=0.5)¶

class pomdp_py.problems.rocksample.cythonize.rocksample_problem.SampleAction¶: Bases: RSAction

pomdp_py.problems.rocksample.cythonize.rocksample_problem.euclidean_dist(p1, p2)¶

pomdp_py.problems.rocksample.cythonize.rocksample_problem.init_particles_belief(k, num_particles, init_state, belief='uniform')¶

pomdp_py.problems.rocksample.cythonize.rocksample_problem.main()¶

pomdp_py.problems.rocksample.cythonize.rocksample_problem.test_planner(rocksample, planner, nsteps=3, discount=0.95)¶

Table of Contents

Navigation

Related Topics

Donate/support

pomdp_py.problems.rocksample.cythonize package¶

Submodules¶

pomdp_py.problems.rocksample.cythonize.rocksample_problem module¶

pomdp_py.problems.rocksample.cythonize.rocksample_problem module¶

pomdp_py.problems.rocksample.cythonize.rocksample_problem module¶

pomdp_py.problems.rocksample.cythonize.rocksample_problem module¶

pomdp_py.problems.rocksample.cythonize.run_rocksample module¶

Module contents¶