pomdp_py.utils package¶
|
Utilities for interfacing with external libraries |
This module contains utility functions making it easier to debug POMDP planning. |
|
Some particular implementations of the interface for convenience |
|
Utility functions for Cython code. |
|
Utilties for typography, i.e. dealing with strings for the purpose of displaying them. |
|
Assorted utilities for math |
|
Misc Python utilities |
|
Utilities for dealing with colors |
Subpackages¶
Submodules¶
pomdp_py.utils.colors module¶
Utilities for dealing with colors
- pomdp_py.utils.colors.lighter(color, percent)[source]¶
assumes color is rgb between (0, 0, 0) and (255, 255, 255)
pomdp_py.utils.cython_utils.cpython-37m-x86_64-linux-gnu module¶
pomdp_py.utils.cython_utils module¶
Utility functions for Cython code.
- pomdp_py.utils.cython_utils.det_dict_hash(dct, keep=9)¶
Deterministic hash of a dictionary without sorting.
pomdp_py.utils.debugging module¶
This module contains utility functions making it easier to debug POMDP planning.
TreeDebugger¶
The core debugging functionality for POMCP/POUCT search trees is incorporated
into the TreeDebugger. It is designed for ease of use during a pdb
or
ipdb
debugging session. Here is a minimal example usage:
from pomdp_py.utils import TreeDebugger
from pomdp_problems.tiger import TigerProblem
# pomdp_py.Agent
agent = TigerProblem.create("tiger-left", 0.5, 0.15).agent
# suppose pouct is a pomdp_py.POUCT object (POMCP works too)
pouct = pomdp_py.POUCT(max_depth=4, discount_factor=0.95,
num_sims=4096, exploration_const=200,
rollout_policy=tiger_problem.agent.policy_model)
action = pouct.plan(agent)
dd = TreeDebugger(agent.tree)
import pdb; pdb.set_trace()
When the program executes, you enter the pdb debugger, and you can:
(Pdb) dd.pp
_VNodePP(n=4095, v=-19.529)(depth=0)
├─── ₀listen⟶_QNodePP(n=4059, v=-19.529)
│ ├─── ₀tiger-left⟶_VNodePP(n=2013, v=-16.586)(depth=1)
│ │ ├─── ₀listen⟶_QNodePP(n=1883, v=-16.586)
│ │ │ ├─── ₀tiger-left⟶_VNodePP(n=1441, v=-8.300)(depth=2)
... # prints out the entire tree; Colored in terminal.
(Pdb) dd.p(1)
_VNodePP(n=4095, v=-19.529)(depth=0)
├─── ₀listen⟶_QNodePP(n=4059, v=-19.529)
│ ├─── ₀tiger-left⟶_VNodePP(n=2013, v=-16.586)(depth=1)
│ │ ├─── ₀listen⟶_QNodePP(n=1883, v=-16.586)
│ │ ├─── ₁open-left⟶_QNodePP(n=18, v=-139.847)
│ │ └─── ₂open-right⟶_QNodePP(n=112, v=-57.191)
... # prints up to depth 1
Note that the printed texts are colored in the terminal.
You can retrieve the subtree through indexing:
(Pdb) dd[0]
listen⟶_QNodePP(n=4059, v=-19.529)
- [0] tiger-left: VNode(n=2013, v=-16.586)
- [1] tiger-right: VNode(n=2044, v=-16.160)
(Pdb) dd[0][1][2]
open-right⟶_QNodePP(n=15, v=-148.634)
- [0] tiger-left: VNode(n=7, v=-20.237)
- [1] tiger-right: VNode(n=6, v=8.500)
You can obtain the currently preferred action sequence by:
(Pdb) dd.mbp
listen []
listen []
listen []
listen []
open-left []
_VNodePP(n=4095, v=-19.529)(depth=0)
├─── ₀listen⟶_QNodePP(n=4059, v=-19.529)
│ └─── ₁tiger-right⟶_VNodePP(n=2044, v=-16.160)(depth=1)
│ ├─── ₀listen⟶_QNodePP(n=1955, v=-16.160)
│ │ └─── ₁tiger-right⟶_VNodePP(n=1441, v=-8.300)(depth=2)
│ │ ├─── ₀listen⟶_QNodePP(n=947, v=-8.300)
│ │ │ └─── ₁tiger-right⟶_VNodePP(n=768, v=0.022)(depth=3)
│ │ │ ├─── ₀listen⟶_QNodePP(n=462, v=0.022)
│ │ │ │ └─── ₁tiger-right⟶_VNodePP(n=395, v=10.000)(depth=4)
│ │ │ │ ├─── ₁open-left⟶_QNodePP(n=247, v=10.000)
mbp
stands for “mark best plan”.
To explore more features, browse the list of methods in the documentation.
- class pomdp_py.utils.debugging.TreeDebugger(tree)[source]¶
Bases:
object
Helps you debug the search tree; A search tree is a tree that contains a subset of future histories, organized into QNodes (value represents Q(b,a); children are observations) and VNodes (value represents V(b); children are actions).
- property depth¶
Tree depth starts from 0 (root node only). It is the largest number of edges on a path from root to leaf.
- property d¶
alias for depth
- property num_layers¶
Returns the number of layers; It is the number of layers of nodes, which equals to depth + 1
- property nl¶
alias for num_layers
- property nn¶
Returns the total number of nodes in the tree
- property nq¶
Returns the total number of QNodes in the tree
- property nv¶
Returns the total number of VNodes in the tree
- layer(depth, as_debuggers=True)[source]¶
Returns a list of nodes at the given depth. Will only return VNodes. Warning: If depth is high, there will likely be a huge number of nodes.
- Parameters:
depth (int) – Depth of the tree
as_debuggers (bool) – True if return a list of TreeDebugger objects, one for each tree on the layer.
- property leaf¶
- property b¶
alias for back
- property root¶
The root node when first creating this TreeDebugger
- property r¶
alias for root
- property c¶
Current node of interaction
- property pp¶
print tree, with preset options
- property mbp¶
Mark Best and Print. Mark the best sequence, and then print with only the marked nodes
- property pm¶
Print marked only
- mark_sequence(seq, color='blue')[source]¶
Given a list of keys (understandable by __getitem__ in _NodePP), mark nodes (both QNode and VNode) along the path in the tree. Note this sequence starts from self.current; So self.current will also be marked.
- property clear¶
Clear the marks
- property bestseq¶
Returns a list of actions, observation sequence that have the highest value for each step. Such a sequence is “preferred”.
Also, prints out the list of preferred actions for each step into the future
- static single_node_str(node, parent_edge=None, indent=1, include_children=True)[source]¶
Returns a string for printing given a single vnode.
- static preferred_actions(root, max_depth=None)[source]¶
Print out the currently preferred actions up to given max_depth
- path(dest)[source]¶
alias for path_to; Example usage:
marking path from root to the first node on the second layer:
dd.mark(dd.path(dd.layer(2)[0]))
- path_to(dest)[source]¶
Returns a list of keys (actions / observations) that represents the path from self.current to the given node dest. Returns None if the path does not exist. Uses DFS. Can be useful for marking path to a node to a specific layer. Note that the returned path is a list of keys (i.e. edges), not nodes.
pomdp_py.utils.math module¶
Assorted utilities for math
pomdp_py.utils.misc module¶
Misc Python utilities
- class pomdp_py.utils.misc.special_char[source]¶
Bases:
object
- left = '←'¶
- up = '↑'¶
- right = '→'¶
- down = '↓'¶
- longleft = '⟵'¶
- longright = '⟶'¶
- hline = '─'¶
- vline = '│'¶
- bottomleft = '└'¶
- longbottomleft = '└─'¶
- topleft = '┌'¶
- longtopleft = '┌─'¶
- topright = '┐'¶
- longtopright = '─┐'¶
- bottomright = '┘'¶
- longbottomright = '─┘'¶
- intersect = '┼'¶
- topt = '┬'¶
- leftt = '├'¶
- rightt = '┤'¶
- bottomt = '┴'¶
- shadebar = '▒'¶
- SUBSCRIPT = {48: 8320, 49: 8321, 50: 8322, 51: 8323, 52: 8324, 53: 8325, 54: 8326, 55: 8327, 56: 8328, 57: 8329}¶
pomdp_py.utils.plotting module¶
pomdp_py.utils.templates module¶
Some particular implementations of the interface for convenience
- class pomdp_py.utils.templates.SimpleState(data)[source]¶
Bases:
State
A SimpleState is a state that stores one piece of hashable data and the equality of two states of this kind depends just on this data
- class pomdp_py.utils.templates.SimpleAction(name)[source]¶
Bases:
Action
A SimpleAction is an action defined by a string name
- class pomdp_py.utils.templates.SimpleObservation(data)[source]¶
Bases:
Observation
A SimpleObservation is an observation with a piece of hashable data that defines the equality.
- class pomdp_py.utils.templates.DetTransitionModel(epsilon=1e-12)[source]¶
Bases:
TransitionModel
A DetTransitionModel is a deterministic transition model. A probability of 1 - epsilon is given for correct transition, and epsilon is given for incorrect transition.
- class pomdp_py.utils.templates.DetObservationModel(epsilon=1e-12)[source]¶
Bases:
ObservationModel
A DetTransitionModel is a deterministic transition model. A probability of 1 - epsilon is given for correct transition, and epsilon is given for incorrect transition.
- probability(self, observation, next_state, action)[source]¶
Returns the probability of \(\Pr(o|s',a)\).
- Parameters:
observation (Observation) – the observation \(o\)
next_state (State) – the next state \(s'\)
action (Action) – the action \(a\)
- Returns:
the probability \(\Pr(o|s',a)\)
- Return type:
float
- class pomdp_py.utils.templates.DetRewardModel[source]¶
Bases:
RewardModel
A DetRewardModel is a deterministic reward model (the most typical kind).
- class pomdp_py.utils.templates.UniformPolicyModel(actions)[source]¶
Bases:
RolloutPolicy
- class pomdp_py.utils.templates.TabularTransitionModel(weights)[source]¶
Bases:
TransitionModel
This tabular transition model is built given a dictionary that maps a tuple (state, action, next_state) to a probability. This model assumes that the given weights is complete, that is, it specifies the probability of all state-action-nextstate combinations
- class pomdp_py.utils.templates.TabularObservationModel(weights)[source]¶
Bases:
ObservationModel
This tabular observation model is built given a dictionary that maps a tuple (next_state, action, observation) to a probability. This model assumes that the given weights is complete.
- class pomdp_py.utils.templates.TabularRewardModel(rewards)[source]¶
Bases:
RewardModel
This tabular reward model is built given a dictionary that maps a state or a tuple (state, action), or (state, action, next_state) to a probability. This model assumes that the given rewards is complete.
pomdp_py.utils.test_utils module¶
pomdp_py.utils.typ module¶
Utilties for typography, i.e. dealing with strings for the purpose of displaying them.