pomdp_py.framework package¶
Important classes in pomdp_py.framework.basics
:
A Distribution is a probability function that maps from variable value to a real value, that is, \(Pr(X=x)\). 

A GenerativeDistribution is a Distribution that additionally exhibits generative properties. 

A TransitionModel models the distribution \(T(s,a,s')=\Pr(s's,a)\). 

An ObservationModel models the distribution \(O(s',a,o)=\Pr(os',a)\). 

A BlackboxModel is the generative distribution \(G(s,a)\) which can generate samples where each is a tuploe \((s',o,r)\). 

A RewardModel models the distribution \(\Pr(rs,a,s')\) where \(r\in\mathbb{R}\) with argmax denoted as denoted as \(R(s,a,s')\). 

PolicyModel models the distribution \(\pi(as)\). 

An Agent operates in an environment by taking actions, receiving observations, and updating its belief. 

An Environment maintains the true state of the world. 

A POMDP instance = agent ( 

The State class. 

The Action class. 

The Observation class. 

An option is a temporally abstracted action that is defined by (I, pi, B), where I is a initiation set, pi is a policy, and B is a termination condition 
pomdp_py.framework.basics module¶

class
pomdp_py.framework.basics.
Action
¶ Bases:
object
The Action class. Action must be hashable.

class
pomdp_py.framework.basics.
Agent
¶ Bases:
object
An Agent operates in an environment by taking actions, receiving observations, and updating its belief. Taking actions is the job of a planner (
Planner
), and the belief update is the job taken care of by the belief representation or the planner. But, the Agent supplies theTransitionModel
,ObservationModel
,RewardModel
, ORBlackboxModel
to the planner or the belief update algorithm. __init__(self, init_belief,
policy_model, transition_model=None, observation_model=None, reward_model=None, blackbox_model=None)

add_attr
(self, attr_name, attr_value)¶ A function that allows adding attributes to the agent. Sometimes useful for planners to store agentspecific information.

all_actions
¶ Only available if the policy model implements get_all_actions.

all_observations
¶ Only available if the observation model implements get_all_observations.

all_states
¶ Only available if the transition model implements get_all_states.

belief
¶ Current belief distribution.

history
¶ Current history.

init_belief
¶ Initial belief distribution.

set_belief
(self, belief, prior=False)¶

update
(self, real_action, real_observation, **kwargs)¶ updates the history and performs belief update

update_history
(self, real_action, real_observation)¶

class
pomdp_py.framework.basics.
BlackboxModel
¶ Bases:
object
A BlackboxModel is the generative distribution \(G(s,a)\) which can generate samples where each is a tuploe \((s',o,r)\).

argmax
(self, state, action, **kwargs)¶ Returns the most likely (s’,o,r)

sample
(self, state, action, **kwargs)¶ Sample (s’,o,r) ~ G(s’,o,r)


class
pomdp_py.framework.basics.
Distribution
¶ Bases:
object
A Distribution is a probability function that maps from variable value to a real value, that is, \(Pr(X=x)\).

__getitem__
(self, varval)¶ Probability evaulation. Returns the probability of \(Pr(X=varval)\).

__setitem__
(self, varval, value)¶ Sets the probability of \(X=varval\) to be value.


class
pomdp_py.framework.basics.
Environment
¶ Bases:
object
An Environment maintains the true state of the world. For example, it is the 2D gridworld, rendered by pygame. Or it could be the 3D simulated world rendered by OpenGL. Therefore, when coding up an Environment, the developer should have in mind how to represent the state so that it can be used by a POMDP or OOPOMDP.
The Environment is passive. It never observes nor acts.

apply_transition
(self, next_state)¶ Apply the transition, that is, assign current state to be the next_state.

cur_state
¶ Current state of the environment

provide_observation
(self, observation_model, action, **kwargs)¶ Returns an observation sampled according to \(\Pr(os',a)\) where \(s'\) is current environment
state()
, \(a\) is the given action, and \(\Pr(os',a)\) is the observation_model. Parameters
observation_model (ObservationModel) –
action (Action) –
 Returns
an observation sampled from \(\Pr(os',a)\).
 Return type

reward_model
¶ The
RewardModel
underlying the environment

state
¶ Synonym for
cur_state()

state_transition
(self, action, execute=True, **kwargs)¶ Simulates a state transition given action. If execute is set to True, then the resulting state will be the new current state of the environment.
 Parameters
action (Action) – action that triggers the state transition
execute (bool) – If True, the resulting state of the transition will become the current state.
 Returns
reward as a result of action and state transition, if execute is True (next_state, reward) if execute is False.
 Return type
float or tuple

transition_model
¶ The
TransitionModel
underlying the environment


class
pomdp_py.framework.basics.
GenerativeDistribution
¶ Bases:
pomdp_py.framework.basics.Distribution
A GenerativeDistribution is a Distribution that additionally exhibits generative properties. That is, it supports
argmax()
(ormpe()
) andrandom()
functions.
get_histogram
(self)¶ Returns a dictionary from state to probability

mpe
(self, **kwargs)¶ Returns the value of the variable that has the highest probability.


class
pomdp_py.framework.basics.
Observation
¶ Bases:
object
The Observation class. Observation must be hashable.

class
pomdp_py.framework.basics.
ObservationModel
¶ Bases:
object
An ObservationModel models the distribution \(O(s',a,o)=\Pr(os',a)\).

argmax
(self, next_state, action, **kwargs)¶ Returns the most likely observation

get_all_observations
(self)¶ Returns a set of all possible observations, if feasible.

get_distribution
(self, next_state, action, **kwargs)¶ Returns the underlying distribution of the model

probability
(self, observation, next_state, action, **kwargs)¶ Returns the probability of \(\Pr(os',a)\).
 Parameters
observation (Observation) – the observation \(o\)
next_state (State) – the next state \(s'\)
action (Action) – the action \(a\)
 Returns
the probability \(\Pr(os',a)\)
 Return type
float

sample
(self, next_state, action, **kwargs)¶ Returns observation randomly sampled according to the distribution of this observation model.
 Parameters
 Returns
the observation \(o\)
 Return type


class
pomdp_py.framework.basics.
Option
¶ Bases:
pomdp_py.framework.basics.Action
An option is a temporally abstracted action that is defined by (I, pi, B), where I is a initiation set, pi is a policy, and B is a termination condition
Described in Between MDPs and semiMDPs: A framework for temporal abstraction in reinforcement learning

initiation
(self, state, **kwargs)¶ Returns True if the given parameters satisfy the initiation set

policy
¶ Returns the policy model (PolicyModel) of this option.

sample
(self, state, **kwargs)¶ Samples an action from this option’s policy. Convenience function; Can be overriden if don’t feel like writing a PolicyModel class

termination
(self, state, **kwargs)¶ Returns a boolean of whether state satisfies the termination condition; Technically returning a float between 0 and 1 is also allowed.


class
pomdp_py.framework.basics.
POMDP
¶ Bases:
object
A POMDP instance = agent (
Agent
) + env (Environment
).__init__(self, agent, env, name=”POMDP”)

class
pomdp_py.framework.basics.
PolicyModel
¶ Bases:
object
PolicyModel models the distribution \(\pi(as)\). It can also be treated as modeling \(\pi(ah_t)\) by regarding state parameters as history.
The reason to have a policy model is to accommodate problems with very large action spaces, and the available actions may vary depending on the state (that is, certain actions have probabilty=0)

argmax
(self, state, **kwargs)¶ Returns the most likely reward

get_all_actions
(self, *args, **kwargs)¶ Returns a set of all possible actions, if feasible.

get_distribution
(self, state, **kwargs)¶ Returns the underlying distribution of the model

probability
(self, action, state, **kwargs)¶ Returns the probability of \(\pi(as)\).

sample
(self, state, **kwargs)¶ Returns action randomly sampled according to the distribution of this policy model.

update
(self, state, next_state, action, **kwargs)¶ Policy model may be updated given a (s,a,s’) pair.


class
pomdp_py.framework.basics.
RewardModel
¶ Bases:
object
A RewardModel models the distribution \(\Pr(rs,a,s')\) where \(r\in\mathbb{R}\) with argmax denoted as denoted as \(R(s,a,s')\).

argmax
(self, state, action, next_state, **kwargs)¶ Returns the most likely reward

get_distribution
(self, state, action, next_state, **kwargs)¶ Returns the underlying distribution of the model

probability
(self, reward, state, action, next_state, **kwargs)¶ Returns the probability of \(\Pr(rs,a,s')\).

sample
(self, state, action, next_state, **kwargs)¶ Returns reward randomly sampled according to the distribution of this reward model.


class
pomdp_py.framework.basics.
State
¶ Bases:
object
The State class. State must be hashable.

class
pomdp_py.framework.basics.
TransitionModel
¶ Bases:
object
A TransitionModel models the distribution \(T(s,a,s')=\Pr(s's,a)\).

argmax
(self, state, action, **kwargs)¶ Returns the most likely next state

get_all_states
(self)¶ Returns a set of all possible states, if feasible.

get_distribution
(self, state, action, **kwargs)¶ Returns the underlying distribution of the model

probability
(self, next_state, state, action, **kwargs)¶ Returns the probability of \(\Pr(s's,a)\).


pomdp_py.framework.basics.
sample_explict_models
(TransitionModel T, ObservationModel O, RewardModel R, State state, Action action, float discount_factor=1.0)¶

pomdp_py.framework.basics.
sample_generative_model
(Agent agent, State state, Action action, float discount_factor=1.0)¶ \((s', o, r) \sim G(s, a)\)
If the agent has transition/observation models, a black box will be created based on these models (i.e. \(s'\) and \(o\) will be sampled according to these models).
pomdp_py.framework.oopomdp module¶
This module describes components of the OOPOMDP interface in pomdp_py.
An OOPOMDP is a specific type of POMDP where the state and observation spaces are factored by objects. As a result, the transition, observation, and belief distributions are all factored by objects. A main benefit of using OOPOMDP is that the object factoring reduces the scaling of belief space from exponential to linear as the number of objects increases. See [1].

class
pomdp_py.framework.oopomdp.
OOBelief
¶ Bases:
pomdp_py.framework.basics.GenerativeDistribution
Belief factored by objects.

__getitem__
(self, state)¶ Returns belief probability of given state

__setitem__
(self, oostate, value)¶ Sets the probability of a given oostate to value. Note always feasible.

mpe
(self, **kwargs)¶ Returns most likely state.

object_belief
(self, objid)¶ Returns the belief (GenerativeDistribution) for the given object.

object_beliefs
¶

random
(self, **kwargs)¶ Returns a random state

set_object_belief
(self, objid, belief)¶ Sets the belief of object to be the given belief (GenerativeDistribution)


class
pomdp_py.framework.oopomdp.
OOObservation
¶ Bases:
pomdp_py.framework.basics.Observation

factor
(self, next_state, action, **kwargs)¶ Factors the observation by objects. That is, \(z\mapsto z_1,\cdots,z_n\)

merge
(cls, object_observations, next_state, action, **kwargs)¶ Merges the factored object_observations into a single OOObservation.
 Parameters
 Returns
the merged observation.
 Return type


class
pomdp_py.framework.oopomdp.
OOObservationModel
¶ Bases:
pomdp_py.framework.basics.ObservationModel
\(O(z  s', a) = \prod_i O(z_i'  s', a)\)
__init__(self, observation_models): :param observation_models: :type observation_models: dict

__getitem__
(self, objid)¶ Returns observation model for given object

argmax
(self, next_state, action, **kwargs)¶ Returns most likely observation

observation_models
¶

probability
(self, observation, next_state, action, **kwargs)¶ Returns \(O(z  s', a)\).

sample
(self, next_state, action, argmax=False, **kwargs)¶ Returns random observation


class
pomdp_py.framework.oopomdp.
OOPOMDP
¶ Bases:
pomdp_py.framework.basics.POMDP
An OOPOMDP is again defined by an agent and an environment.
__init__(self, agent, env, name=”OOPOMDP”)

class
pomdp_py.framework.oopomdp.
OOState
¶ Bases:
pomdp_py.framework.basics.State
State that can be factored by objects, that is, to ObjectState(s).
__init__(self, object_states)

__getitem__
()¶ Return self[key].

copy
(self)¶ Copies the state.

get_object_attribute
(self, objid, attr)¶ Returns the attributes of requested object

get_object_class
(self, objid)¶ Returns the class of requested object

get_object_state
(self, objid)¶ Returns the ObjectState for given object.

set_object_state
(self, objid, object_state)¶ Sets the state of the given object to be the given object state (ObjectState)

situation
¶ This is a frozenset which can be used to identify the situation of this state since it supports hashing.


class
pomdp_py.framework.oopomdp.
OOTransitionModel
¶ Bases:
pomdp_py.framework.basics.TransitionModel
\(T(s'  s, a) = \prod_i T(s_i'  s, a)\)
__init__(self, transition_models): :param transition_models: :type transition_models: dict

__getitem__
(self, objid)¶ Returns transition model for given object

argmax
(self, state, action, **kwargs)¶ Returns the most likely next state

sample
(self, state, action, argmax=False, **kwargs)¶ Returns random next_state

transition_models
¶


class
pomdp_py.framework.oopomdp.
ObjectState
¶ Bases:
pomdp_py.framework.basics.State
This is the result of OOState factoring; A state in an OOPOMDP is made up of ObjectState(s), each with an object class (str) and a set of attributes (dict).

__getitem__
(self, attr)¶ Returns the attribute

__setitem__
(self, attr, value)¶ Sets the attribute attr to the given value.

copy
(self)¶ Copies this ObjectState.

pomdp_py.framework.planner module¶

class
pomdp_py.framework.planner.
Planner
¶ Bases:
object
A Planner can
plan()
the next action to take for a given agent (online planning). The planner can be updated (which may also update the agent belief) once an action is executed and observation received.
plan
(self, Agent agent)¶ The agent carries the information: Bt, ht, O,T,R/G, pi, necessary for planning

update
(self, Agent agent, Action real_action, Observation real_observation)¶ Updates the planner based on real action and observation. Updates the agent accordingly if necessary. If the agent’s belief is also updated here, the update_agent_belief attribute should be set to True. By default, does nothing.

updates_agent_belief
()¶ True if planner’s update function also updates agent’s belief.
