pomdp_py.utils package


pomdp_py.utils.interfaces

Utilities for interfacing with external libraries

pomdp_py.utils.debugging

This module contains utility functions making it easier to debug POMDP planning.

pomdp_py.utils.templates

Some particular implementations of the interface for convenience

pomdp_py.utils.cython_utils

Utility functions for Cython code.

pomdp_py.utils.typ

Utilties for typography, i.e. dealing with strings for the purpose of displaying them.

pomdp_py.utils.math

Assorted utilities for math

pomdp_py.utils.misc

Misc Python utilities

pomdp_py.utils.colors

Utilities for dealing with colors

Subpackages

Submodules

pomdp_py.utils.colors module

Utilities for dealing with colors

pomdp_py.utils.colors.lighter(color, percent)[source]

assumes color is rgb between (0, 0, 0) and (255, 255, 255)

pomdp_py.utils.colors.rgb_to_hex(rgb)[source]
pomdp_py.utils.colors.hex_to_rgb(hx)[source]

hx is a string, begins with #. ASSUME len(hx)=7.

pomdp_py.utils.colors.inverse_color_rgb(rgb)[source]
pomdp_py.utils.colors.inverse_color_hex(hx)[source]

hx is a string, begins with #. ASSUME len(hx)=7.

pomdp_py.utils.colors.random_unique_color(colors, ctype=1)[source]

ctype=1: completely random ctype=2: red random ctype=3: blue random ctype=4: green random ctype=5: yellow random

pomdp_py.utils.cython_utils.cpython-37m-x86_64-linux-gnu module

pomdp_py.utils.cython_utils module

Utility functions for Cython code.

pomdp_py.utils.cython_utils.det_dict_hash(dct, keep=9)

Deterministic hash of a dictionary without sorting.

pomdp_py.utils.debugging module

This module contains utility functions making it easier to debug POMDP planning.

TreeDebugger

The core debugging functionality for POMCP/POUCT search trees is incorporated into the TreeDebugger. It is designed for ease of use during a pdb or ipdb debugging session. Here is a minimal example usage:

from pomdp_py.utils import TreeDebugger
from pomdp_problems.tiger import TigerProblem

# pomdp_py.Agent
agent = TigerProblem.create("tiger-left", 0.5, 0.15).agent

# suppose pouct is a pomdp_py.POUCT object (POMCP works too)
pouct = pomdp_py.POUCT(max_depth=4, discount_factor=0.95,
                       num_sims=4096, exploration_const=200,
                       rollout_policy=tiger_problem.agent.policy_model)

action = pouct.plan(agent)
dd = TreeDebugger(agent.tree)
import pdb; pdb.set_trace()

When the program executes, you enter the pdb debugger, and you can:

(Pdb) dd.pp
_VNodePP(n=4095, v=-19.529)(depth=0)
├─── ₀listen⟶_QNodePP(n=4059, v=-19.529)
│    ├─── ₀tiger-left⟶_VNodePP(n=2013, v=-16.586)(depth=1)
│    │    ├─── ₀listen⟶_QNodePP(n=1883, v=-16.586)
│    │    │    ├─── ₀tiger-left⟶_VNodePP(n=1441, v=-8.300)(depth=2)
... # prints out the entire tree; Colored in terminal.

(Pdb) dd.p(1)
_VNodePP(n=4095, v=-19.529)(depth=0)
├─── ₀listen⟶_QNodePP(n=4059, v=-19.529)
│    ├─── ₀tiger-left⟶_VNodePP(n=2013, v=-16.586)(depth=1)
│    │    ├─── ₀listen⟶_QNodePP(n=1883, v=-16.586)
│    │    ├─── ₁open-left⟶_QNodePP(n=18, v=-139.847)
│    │    └─── ₂open-right⟶_QNodePP(n=112, v=-57.191)
... # prints up to depth 1

Note that the printed texts are colored in the terminal.

You can retrieve the subtree through indexing:

(Pdb) dd[0]
listen⟶_QNodePP(n=4059, v=-19.529)
    - [0] tiger-left: VNode(n=2013, v=-16.586)
    - [1] tiger-right: VNode(n=2044, v=-16.160)

(Pdb) dd[0][1][2]
open-right⟶_QNodePP(n=15, v=-148.634)
    - [0] tiger-left: VNode(n=7, v=-20.237)
    - [1] tiger-right: VNode(n=6, v=8.500)

You can obtain the currently preferred action sequence by:

(Pdb) dd.mbp
   listen  []
   listen  []
   listen  []
   listen  []
   open-left  []
 _VNodePP(n=4095, v=-19.529)(depth=0)
 ├─── ₀listen⟶_QNodePP(n=4059, v=-19.529)
 │    └─── ₁tiger-right⟶_VNodePP(n=2044, v=-16.160)(depth=1)
 │         ├─── ₀listen⟶_QNodePP(n=1955, v=-16.160)
 │         │    └─── ₁tiger-right⟶_VNodePP(n=1441, v=-8.300)(depth=2)
 │         │         ├─── ₀listen⟶_QNodePP(n=947, v=-8.300)
 │         │         │    └─── ₁tiger-right⟶_VNodePP(n=768, v=0.022)(depth=3)
 │         │         │         ├─── ₀listen⟶_QNodePP(n=462, v=0.022)
 │         │         │         │    └─── ₁tiger-right⟶_VNodePP(n=395, v=10.000)(depth=4)
 │         │         │         │         ├─── ₁open-left⟶_QNodePP(n=247, v=10.000)

mbp stands for “mark best plan”.

To explore more features, browse the list of methods in the documentation.

class pomdp_py.utils.debugging.TreeDebugger(tree)[source]

Bases: object

Helps you debug the search tree; A search tree is a tree that contains a subset of future histories, organized into QNodes (value represents Q(b,a); children are observations) and VNodes (value represents V(b); children are actions).

num_nodes(kind='all')[source]

Returns the total number of nodes in the tree rooted at “current”

property depth

Tree depth starts from 0 (root node only). It is the largest number of edges on a path from root to leaf.

property d

alias for depth

property num_layers

Returns the number of layers; It is the number of layers of nodes, which equals to depth + 1

property nl

alias for num_layers

property nn

Returns the total number of nodes in the tree

property nq

Returns the total number of QNodes in the tree

property nv

Returns the total number of VNodes in the tree

l(depth, as_debuggers=True)[source]

alias for layer

layer(depth, as_debuggers=True)[source]

Returns a list of nodes at the given depth. Will only return VNodes. Warning: If depth is high, there will likely be a huge number of nodes.

Parameters:
  • depth (int) – Depth of the tree

  • as_debuggers (bool) – True if return a list of TreeDebugger objects, one for each tree on the layer.

property leaf
step(key)[source]

Updates current interaction node to follow the edge along key

s(key)[source]

alias for step

back()[source]

move current node of interaction back to parent

property b

alias for back

property root

The root node when first creating this TreeDebugger

property r

alias for root

property c

Current node of interaction

p(*args, **kwargs)[source]

print tree

property pp

print tree, with preset options

property mbp

Mark Best and Print. Mark the best sequence, and then print with only the marked nodes

property pm

Print marked only

mark_sequence(seq, color='blue')[source]

Given a list of keys (understandable by __getitem__ in _NodePP), mark nodes (both QNode and VNode) along the path in the tree. Note this sequence starts from self.current; So self.current will also be marked.

mark(seq, **kwargs)[source]

alias for mark_sequence

mark_path(dest, **kwargs)[source]

paths the path to dest node

markp(dest, **kwargs)[source]

alias to mark_path

property clear

Clear the marks

property bestseq

Returns a list of actions, observation sequence that have the highest value for each step. Such a sequence is “preferred”.

Also, prints out the list of preferred actions for each step into the future

bestseqd(max_depth)[source]

alias for bestseq except with

static single_node_str(node, parent_edge=None, indent=1, include_children=True)[source]

Returns a string for printing given a single vnode.

static preferred_actions(root, max_depth=None)[source]

Print out the currently preferred actions up to given max_depth

path(dest)[source]

alias for path_to; Example usage:

marking path from root to the first node on the second layer:

dd.mark(dd.path(dd.layer(2)[0]))

path_to(dest)[source]

Returns a list of keys (actions / observations) that represents the path from self.current to the given node dest. Returns None if the path does not exist. Uses DFS. Can be useful for marking path to a node to a specific layer. Note that the returned path is a list of keys (i.e. edges), not nodes.

static tree_stats(root, max_depth=None)[source]

Gether statistics about the tree

pomdp_py.utils.debugging.sorted_by_str(enumerable)[source]
pomdp_py.utils.debugging.interpret_color(colorstr)[source]

pomdp_py.utils.math module

Assorted utilities for math

pomdp_py.utils.math.vec(p1, p2)[source]

vector from p1 to p2

pomdp_py.utils.math.proj(vec1, vec2, scalar=False)[source]
pomdp_py.utils.math.R_x(th)[source]
pomdp_py.utils.math.R_y(th)[source]
pomdp_py.utils.math.R_z(th)[source]
pomdp_py.utils.math.T(dx, dy, dz)[source]
pomdp_py.utils.math.to_radians(th)[source]
pomdp_py.utils.math.R_between(v1, v2)[source]
pomdp_py.utils.math.approx_equal(v1, v2, epsilon=1e-06)[source]
pomdp_py.utils.math.euclidean_dist(p1, p2)[source]

pomdp_py.utils.misc module

Misc Python utilities

pomdp_py.utils.misc.remap(oldvalue, oldmin, oldmax, newmin, newmax)[source]
pomdp_py.utils.misc.json_safe(obj)[source]
pomdp_py.utils.misc.safe_slice(arr, start, end)[source]
pomdp_py.utils.misc.similar(a, b)[source]
class pomdp_py.utils.misc.special_char[source]

Bases: object

left = '←'
up = '↑'
right = '→'
down = '↓'
longleft = '⟵'
longright = '⟶'
hline = '─'
vline = '│'
bottomleft = '└'
longbottomleft = '└─'
topleft = '┌'
longtopleft = '┌─'
topright = '┐'
longtopright = '─┐'
bottomright = '┘'
longbottomright = '─┘'
intersect = '┼'
topt = '┬'
leftt = '├'
rightt = '┤'
bottomt = '┴'
shadebar = '▒'
SUBSCRIPT = {48: 8320, 49: 8321, 50: 8322, 51: 8323, 52: 8324, 53: 8325, 54: 8326, 55: 8327, 56: 8328, 57: 8329}

pomdp_py.utils.plotting module

pomdp_py.utils.templates module

Some particular implementations of the interface for convenience

class pomdp_py.utils.templates.SimpleState(data)[source]

Bases: State

A SimpleState is a state that stores one piece of hashable data and the equality of two states of this kind depends just on this data

class pomdp_py.utils.templates.SimpleAction(name)[source]

Bases: Action

A SimpleAction is an action defined by a string name

class pomdp_py.utils.templates.SimpleObservation(data)[source]

Bases: Observation

A SimpleObservation is an observation with a piece of hashable data that defines the equality.

class pomdp_py.utils.templates.DetTransitionModel(epsilon=1e-12)[source]

Bases: TransitionModel

A DetTransitionModel is a deterministic transition model. A probability of 1 - epsilon is given for correct transition, and epsilon is given for incorrect transition.

probability(self, next_state, state, action)[source]

Returns the probability of \(\Pr(s'|s,a)\).

Parameters:
  • state (State) – the state \(s\)

  • next_state (State) – the next state \(s'\)

  • action (Action) – the action \(a\)

Returns:

the probability \(\Pr(s'|s,a)\)

Return type:

float

sample(self, state, action)[source]

Returns next state randomly sampled according to the distribution of this transition model.

Parameters:
  • state (State) – the next state \(s\)

  • action (Action) – the action \(a\)

Returns:

the next state \(s'\)

Return type:

State

class pomdp_py.utils.templates.DetObservationModel(epsilon=1e-12)[source]

Bases: ObservationModel

A DetTransitionModel is a deterministic transition model. A probability of 1 - epsilon is given for correct transition, and epsilon is given for incorrect transition.

probability(self, observation, next_state, action)[source]

Returns the probability of \(\Pr(o|s',a)\).

Parameters:
  • observation (Observation) – the observation \(o\)

  • next_state (State) – the next state \(s'\)

  • action (Action) – the action \(a\)

Returns:

the probability \(\Pr(o|s',a)\)

Return type:

float

sample(self, next_state, action)[source]

Returns observation randomly sampled according to the distribution of this observation model.

Parameters:
  • next_state (State) – the next state \(s'\)

  • action (Action) – the action \(a\)

Returns:

the observation \(o\)

Return type:

Observation

class pomdp_py.utils.templates.DetRewardModel[source]

Bases: RewardModel

A DetRewardModel is a deterministic reward model (the most typical kind).

reward_func(state, action, next_state)[source]
sample(self, state, action, next_state)[source]

Returns reward randomly sampled according to the distribution of this reward model. This is required, i.e. assumed to be implemented for a reward model.

Parameters:
  • state (State) – the next state \(s\)

  • action (Action) – the action \(a\)

  • next_state (State) – the next state \(s'\)

Returns:

the reward \(r\)

Return type:

float

argmax(self, state, action, next_state)[source]

Returns the most likely reward. This is optional.

class pomdp_py.utils.templates.UniformPolicyModel(actions)[source]

Bases: RolloutPolicy

sample(self, state)[source]

Returns action randomly sampled according to the distribution of this policy model.

Parameters:

state (State) – the next state \(s\)

Returns:

the action \(a\)

Return type:

Action

get_all_actions(self, *args)[source]

Returns a set of all possible actions, if feasible.

rollout(self, State state, tuple history=None)[source]
class pomdp_py.utils.templates.TabularTransitionModel(weights)[source]

Bases: TransitionModel

This tabular transition model is built given a dictionary that maps a tuple (state, action, next_state) to a probability. This model assumes that the given weights is complete, that is, it specifies the probability of all state-action-nextstate combinations

probability(self, next_state, state, action)[source]

Returns the probability of \(\Pr(s'|s,a)\).

Parameters:
  • state (State) – the state \(s\)

  • next_state (State) – the next state \(s'\)

  • action (Action) – the action \(a\)

Returns:

the probability \(\Pr(s'|s,a)\)

Return type:

float

sample(self, state, action)[source]

Returns next state randomly sampled according to the distribution of this transition model.

Parameters:
  • state (State) – the next state \(s\)

  • action (Action) – the action \(a\)

Returns:

the next state \(s'\)

Return type:

State

get_all_states(self)[source]

Returns a set of all possible states, if feasible.

class pomdp_py.utils.templates.TabularObservationModel(weights)[source]

Bases: ObservationModel

This tabular observation model is built given a dictionary that maps a tuple (next_state, action, observation) to a probability. This model assumes that the given weights is complete.

probability(observation, next_state, action)[source]

observation is emitted from state

sample(self, next_state, action)[source]

Returns observation randomly sampled according to the distribution of this observation model.

Parameters:
  • next_state (State) – the next state \(s'\)

  • action (Action) – the action \(a\)

Returns:

the observation \(o\)

Return type:

Observation

get_all_observations(self)[source]

Returns a set of all possible observations, if feasible.

class pomdp_py.utils.templates.TabularRewardModel(rewards)[source]

Bases: RewardModel

This tabular reward model is built given a dictionary that maps a state or a tuple (state, action), or (state, action, next_state) to a probability. This model assumes that the given rewards is complete.

sample(self, state, action, next_state)[source]

Returns reward randomly sampled according to the distribution of this reward model. This is required, i.e. assumed to be implemented for a reward model.

Parameters:
  • state (State) – the next state \(s\)

  • action (Action) – the action \(a\)

  • next_state (State) – the next state \(s'\)

Returns:

the reward \(r\)

Return type:

float

pomdp_py.utils.test_utils module

pomdp_py.utils.typ module

Utilties for typography, i.e. dealing with strings for the purpose of displaying them.

class pomdp_py.utils.typ.bcolors[source]

Bases: object

WHITE = '\x1b[97m'
CYAN = '\x1b[96m'
MAGENTA = '\x1b[95m'
BLUE = '\x1b[94m'
GREEN = '\x1b[92m'
YELLOW = '\x1b[93m'
RED = '\x1b[91m'
BOLD = '\x1b[1m'
ENDC = '\x1b[0m'
static disable()[source]
static s(color, content)[source]

Returns a string with color when shown on terminal. color is a constant in bcolors class.

pomdp_py.utils.typ.info(content)[source]
pomdp_py.utils.typ.note(content)[source]
pomdp_py.utils.typ.error(content)[source]
pomdp_py.utils.typ.warning(content)[source]
pomdp_py.utils.typ.success(content)[source]
pomdp_py.utils.typ.bold(content)[source]
pomdp_py.utils.typ.cyan(content)[source]
pomdp_py.utils.typ.magenta(content)[source]
pomdp_py.utils.typ.blue(content)[source]
pomdp_py.utils.typ.green(content)[source]
pomdp_py.utils.typ.yellow(content)[source]
pomdp_py.utils.typ.red(content)[source]
pomdp_py.utils.typ.white(content)[source]

Module contents