pomdp_py.framework package¶
Important classes in pomdp_py.framework.basics
:
A Distribution is a probability function that maps from variable value to a real value, that is, \(Pr(X=x)\). |
|
A GenerativeDistribution is a Distribution that additionally exhibits generative properties. |
|
A TransitionModel models the distribution \(T(s,a,s')=\Pr(s'|s,a)\). |
|
An ObservationModel models the distribution \(O(s',a,o)=\Pr(o|s',a)\). |
|
A BlackboxModel is the generative distribution \(G(s,a)\) which can generate samples where each is a tuple \((s',o,r)\). |
|
A RewardModel models the distribution \(\Pr(r|s,a,s')\) where \(r\in\mathbb{R}\) with argmax denoted as denoted as \(R(s,a,s')\). |
|
PolicyModel models the distribution \(\pi(a|s)\). |
|
An Agent operates in an environment by taking actions, receiving observations, and updating its belief. |
|
An Environment maintains the true state of the world. |
|
A POMDP instance = agent ( |
|
The State class. |
|
The Action class. |
|
The Observation class. |
|
An option is a temporally abstracted action that is defined by (I, pi, B), where I is a initiation set, pi is a policy, and B is a termination condition |
pomdp_py.framework.basics module¶
- class pomdp_py.framework.basics.Action¶
Bases:
object
The Action class. Action must be hashable.
- class pomdp_py.framework.basics.Agent¶
Bases:
object
An Agent operates in an environment by taking actions, receiving observations, and updating its belief. Deciding what action to take is the job of a planner (
Planner
), and the belief update is usually done outside of the agent, taken care of e.g. by the belief representation, or by the planner. The Agent supplies its own version of theTransitionModel
,ObservationModel
,RewardModel
, ORBlackboxModel
to the planner or the belief update algorithm.- __init__(self, init_belief,
policy_model=None, transition_model=None, observation_model=None, reward_model=None, blackbox_model=None, name=None)
- add_attr(self, attr_name, attr_value)¶
A function that allows adding attributes to the agent. Sometimes useful for planners to store agent-specific information.
- all_actions¶
Only available if the policy model implements get_all_actions.
- all_observations¶
Only available if the observation model implements get_all_observations.
- all_states¶
Only available if the transition model implements get_all_states.
- belief¶
Current belief distribution.
- history¶
Current history.
- init_belief¶
Initial belief distribution.
- set_belief(self, belief, prior=False)¶
- set_models(transition_model=None, observation_model=None, reward_model=None, blackbox_model=None, policy_model=None)¶
Re-assign the models to be the ones given.
- set_name(self, str name)¶
gives this agent a name
- update(self, real_action, real_observation)¶
updates the history and performs belief update
- update_history(self, real_action, real_observation)¶
- class pomdp_py.framework.basics.BlackboxModel¶
Bases:
object
A BlackboxModel is the generative distribution \(G(s,a)\) which can generate samples where each is a tuple \((s',o,r)\).
- argmax(self, state, action)¶
Returns the most likely (s’,o,r)
- sample(self, state, action)¶
Sample (s’,o,r) ~ G(s,a)
- class pomdp_py.framework.basics.Distribution¶
Bases:
object
A Distribution is a probability function that maps from variable value to a real value, that is, \(Pr(X=x)\).
- __getitem__(self, varval)¶
Probability evaulation. Returns the probability of \(Pr(X=varval)\).
- __setitem__(self, varval, value)¶
Sets the probability of \(X=varval\) to be value.
- class pomdp_py.framework.basics.Environment¶
Bases:
object
An Environment maintains the true state of the world. For example, it is the 2D gridworld, rendered by pygame. Or it could be the 3D simulated world rendered by OpenGL. Therefore, when coding up an Environment, the developer should have in mind how to represent the state so that it can be used by a POMDP or OOPOMDP.
The Environment is passive. It never observes nor acts.
- __init__(self, init_state,
transition_model=None, reward_model=None, blackbox_model=None)
- apply_transition(self, next_state)¶
Apply the transition, that is, assign current state to be the next_state.
- blackbox_model¶
The
BlackboxModel
underlying the environment
- cur_state¶
Current state of the environment
- provide_observation(self, observation_model, action)¶
Returns an observation sampled according to \(\Pr(o|s',a)\) where \(s'\) is current environment
state()
, \(a\) is the given action, and \(\Pr(o|s',a)\) is the observation_model.- Parameters:
observation_model (ObservationModel) –
action (Action) –
- Returns:
an observation sampled from \(\Pr(o|s',a)\).
- Return type:
- reward_model¶
The
RewardModel
underlying the environment
- set_models(transition_model=None, reward_model=None, blackbox_model=None)¶
Re-assign the models to be the ones given.
- state¶
Synonym for
cur_state()
- state_transition(self, action, execute=True)¶
Simulates a state transition given action. If execute is set to True, then the resulting state will be the new current state of the environment.
- Parameters:
action (Action) – action that triggers the state transition
execute (bool) – If True, the resulting state of the transition will become the current state.
discount_factor (float) – Only necessary if action is an Option. It is the discount factor when executing actions following an option’s policy until reaching terminal condition.
- Returns:
reward as a result of action and state transition, if execute is True (next_state, reward) if execute is False.
- Return type:
float or tuple
- transition_model¶
The
TransitionModel
underlying the environment
- class pomdp_py.framework.basics.GenerativeDistribution¶
Bases:
Distribution
A GenerativeDistribution is a Distribution that additionally exhibits generative properties. That is, it supports
argmax()
(ormpe()
) andrandom()
functions.- get_histogram(self)¶
Returns a dictionary from state to probability
- mpe(self)¶
Returns the value of the variable that has the highest probability.
- class pomdp_py.framework.basics.Observation¶
Bases:
object
The Observation class. Observation must be hashable.
- class pomdp_py.framework.basics.ObservationModel¶
Bases:
object
An ObservationModel models the distribution \(O(s',a,o)=\Pr(o|s',a)\).
- argmax(self, next_state, action)¶
Returns the most likely observation
- get_all_observations(self)¶
Returns a set of all possible observations, if feasible.
- get_distribution(self, next_state, action)¶
Returns the underlying distribution of the model
- probability(self, observation, next_state, action)¶
Returns the probability of \(\Pr(o|s',a)\).
- Parameters:
observation (Observation) – the observation \(o\)
next_state (State) – the next state \(s'\)
action (Action) – the action \(a\)
- Returns:
the probability \(\Pr(o|s',a)\)
- Return type:
float
- sample(self, next_state, action)¶
Returns observation randomly sampled according to the distribution of this observation model.
- Parameters:
- Returns:
the observation \(o\)
- Return type:
- class pomdp_py.framework.basics.Option¶
Bases:
Action
An option is a temporally abstracted action that is defined by (I, pi, B), where I is a initiation set, pi is a policy, and B is a termination condition
Described in Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
- initiate(state)¶
initiation(self, state) Returns True if the given parameters satisfy the initiation set
- policy¶
Returns the policy model (PolicyModel) of this option.
- sample(self, state)¶
Samples an action from this option’s policy. Convenience function; Can be overriden if don’t feel like writing a PolicyModel class
- terminate(state)¶
termination(self, state) Returns a boolean of whether state satisfies the termination condition; Technically returning a float between 0 and 1 is also allowed.
- class pomdp_py.framework.basics.POMDP¶
Bases:
object
A POMDP instance = agent (
Agent
) + env (Environment
).__init__(self, agent, env, name=”POMDP”)
- class pomdp_py.framework.basics.PolicyModel¶
Bases:
object
PolicyModel models the distribution \(\pi(a|s)\). It can also be treated as modeling \(\pi(a|h_t)\) by regarding state parameters as history.
The reason to have a policy model is to accommodate problems with very large action spaces, and the available actions may vary depending on the state (that is, certain actions have probabilty=0)
- argmax(self, state)¶
Returns the most likely reward
- get_all_actions(self, *args)¶
Returns a set of all possible actions, if feasible.
- get_distribution(self, state)¶
Returns the underlying distribution of the model
- probability(self, action, state)¶
Returns the probability of \(\pi(a|s)\).
- sample(self, state)¶
Returns action randomly sampled according to the distribution of this policy model.
- update(self, state, next_state, action)¶
Policy model may be updated given a (s,a,s’) pair.
- class pomdp_py.framework.basics.RewardModel¶
Bases:
object
A RewardModel models the distribution \(\Pr(r|s,a,s')\) where \(r\in\mathbb{R}\) with argmax denoted as denoted as \(R(s,a,s')\).
- argmax(self, state, action, next_state)¶
Returns the most likely reward. This is optional.
- get_distribution(self, state, action, next_state)¶
Returns the underlying distribution of the model
- probability(self, reward, state, action, next_state)¶
Returns the probability of \(\Pr(r|s,a,s')\).
- sample(self, state, action, next_state)¶
Returns reward randomly sampled according to the distribution of this reward model. This is required, i.e. assumed to be implemented for a reward model.
- class pomdp_py.framework.basics.State¶
Bases:
object
The State class. State must be hashable.
- class pomdp_py.framework.basics.TransitionModel¶
Bases:
object
A TransitionModel models the distribution \(T(s,a,s')=\Pr(s'|s,a)\).
- argmax(self, state, action)¶
Returns the most likely next state
- get_all_states(self)¶
Returns a set of all possible states, if feasible.
- get_distribution(self, state, action)¶
Returns the underlying distribution of the model
- probability(self, next_state, state, action)¶
Returns the probability of \(\Pr(s'|s,a)\).
- pomdp_py.framework.basics.sample_explict_models(TransitionModel T, ObservationModel O, RewardModel R, State state, Action action, float discount_factor=1.0)¶
- pomdp_py.framework.basics.sample_generative_model(Agent agent, State state, Action action, float discount_factor=1.0)¶
\((s', o, r) \sim G(s, a)\)
If the agent has transition/observation models, a black box will be created based on these models (i.e. \(s'\) and \(o\) will be sampled according to these models).
pomdp_py.framework.oopomdp module¶
This module describes components of the OO-POMDP interface in pomdp_py.
An OO-POMDP is a specific type of POMDP where the state and observation spaces are factored by objects. As a result, the transition, observation, and belief distributions are all factored by objects. A main benefit of using OO-POMDP is that the object factoring reduces the scaling of belief space from exponential to linear as the number of objects increases. See [1].
- class pomdp_py.framework.oopomdp.DictState¶
Bases:
ObjectState
This is synonymous as ObjectState, but does not convey ‘objectness’ of the information being described.
- class pomdp_py.framework.oopomdp.OOBelief¶
Bases:
GenerativeDistribution
Belief factored by objects.
- __getitem__(self, state)¶
Returns belief probability of given state
- __setitem__(self, oostate, value)¶
Sets the probability of a given oostate to value. Note always feasible.
- b(objid)¶
convenient alias function call
- mpe(self, return_oostate=False, **kwargs)¶
Returns most likely state.
- object_belief(self, objid)¶
Returns the belief (GenerativeDistribution) for the given object.
- object_beliefs¶
- random(self, return_oostate=False, **kwargs)¶
Returns a random state
- set_object_belief(self, objid, belief)¶
Sets the belief of object to be the given belief (GenerativeDistribution)
- class pomdp_py.framework.oopomdp.OOObservation¶
Bases:
Observation
- factor(self, next_state, action, **kwargs)¶
Factors the observation by objects. That is, \(z\mapsto z_1,\cdots,z_n\)
- classmethod merge(cls, object_observations, next_state, action, **kwargs)¶
Merges the factored object_observations into a single OOObservation.
- Parameters:
- Returns:
the merged observation.
- Return type:
- class pomdp_py.framework.oopomdp.OOObservationModel¶
Bases:
ObservationModel
\(O(z | s', a) = \prod_i O(z_i' | s', a)\)
__init__(self, observation_models): :param observation_models: :type observation_models: dict
- __getitem__(self, objid)¶
Returns observation model for given object
- argmax(self, next_state, action, **kwargs)¶
Returns most likely observation
- observation_models¶
- probability(self, observation, next_state, action, **kwargs)¶
Returns \(O(z | s', a)\).
- sample(self, next_state, action, argmax=False, **kwargs)¶
Returns random observation
- class pomdp_py.framework.oopomdp.OOPOMDP¶
Bases:
POMDP
An OO-POMDP is again defined by an agent and an environment.
__init__(self, agent, env, name=”OOPOMDP”)
- class pomdp_py.framework.oopomdp.OOState¶
Bases:
State
State that can be factored by objects, that is, to ObjectState(s).
Note: to change the state of an object, you can use set_object_state. Do not directly assign the object state by e.g. oostate.object_states[objid] = object_state, because it will cause the hashcode to be incorrect in the oostate after the change.
__init__(self, object_states)
- __getitem__(key, /)¶
Return self[key].
- copy(self)¶
Copies the state.
- get_object_attribute(self, objid, attr)¶
Returns the attributes of requested object
- get_object_class(self, objid)¶
Returns the class of requested object
- get_object_state(self, objid)¶
Returns the ObjectState for given object.
- s(objid)¶
convenient alias function
- set_object_state(self, objid, object_state)¶
Sets the state of the given object to be the given object state (ObjectState)
- situation¶
This is a frozenset which can be used to identify the situation of this state since it supports hashing.
- class pomdp_py.framework.oopomdp.OOTransitionModel¶
Bases:
TransitionModel
\(T(s' | s, a) = \prod_i T(s_i' | s, a)\)
__init__(self, transition_models): :param transition_models: :type transition_models: dict
- __getitem__(self, objid)¶
Returns transition model for given object
- argmax(self, state, action, **kwargs)¶
Returns the most likely next state
- sample(self, state, action, argmax=False, **kwargs)¶
Returns random next_state
- transition_models¶
- class pomdp_py.framework.oopomdp.ObjectState¶
Bases:
State
This is the result of OOState factoring; A state in an OO-POMDP is made up of ObjectState(s), each with an object class (str) and a set of attributes (dict).
- __getitem__(self, attr)¶
Returns the attribute
- __setitem__(self, attr, value)¶
Sets the attribute attr to the given value.
- copy(self)¶
Copies this ObjectState.
pomdp_py.framework.planner module¶
- class pomdp_py.framework.planner.Planner¶
Bases:
object
A Planner can
plan()
the next action to take for a given agent (online planning). The planner can be updated (which may also update the agent belief) once an action is executed and observation received.A Planner may be a purely planning algorithm, or it could be using a learned model for planning underneath the hood. Its job is to output an action to take for a given agent.
You can implement a Planner that is specific to an agent, or not. If specific, then when calling
plan()
the agent passed in is expected to always be the same one.- plan(self, Agent agent)¶
The agent carries the information: Bt, ht, O,T,R/G, pi, necessary for planning
- update(self, Agent agent, Action real_action, Observation real_observation)¶
Updates the planner based on real action and observation. Updates the agent accordingly if necessary. If the agent’s belief is also updated here, the update_agent_belief attribute should be set to True. By default, does nothing.
- updates_agent_belief()¶
True if planner’s update function also updates agent’s belief.