pomdp_py.problems.tiger package¶
Tiger¶
Classic problem from Planning and acting in partially observable stochastic domains
Refer to examples.tiger for more details.
Subpackages¶
- pomdp_py.problems.tiger.cythonize package
- Submodules
- pomdp_py.problems.tiger.cythonize.run_tiger module
- pomdp_py.problems.tiger.cythonize.tiger_problem module
- pomdp_py.problems.tiger.cythonize.tiger_problem module
- pomdp_py.problems.tiger.cythonize.tiger_problem module
- pomdp_py.problems.tiger.cythonize.tiger_problem module
- Module contents
Submodules¶
pomdp_py.problems.tiger.cythonize module¶
pomdp_py.problems.tiger.tiger_problem module¶
The classic Tiger problem.
This is a POMDP problem; Namely, it specifies both the POMDP (i.e. state, action, observation space) and the T/O/R for the agent as well as the environment.
The description of the tiger problem is as follows: (Quote from POMDP: Introduction to Partially Observable Markov Decision Processes by Kamalzadeh and Hahsler )
A tiger is put with equal probability behind one of two doors, while treasure is put behind the other one. You are standing in front of the two closed doors and need to decide which one to open. If you open the door with the tiger, you will get hurt (negative reward). But if you open the door with treasure, you receive a positive reward. Instead of opening a door right away, you also have the option to wait and listen for tiger noises. But listening is neither free nor entirely accurate. You might hear the tiger behind the left door while it is actually behind the right door and vice versa.
States: tiger-left, tiger-right Actions: open-left, open-right, listen Rewards:
+10 for opening treasure door. -100 for opening tiger door. -1 for listening.
Observations: You can hear either “tiger-left”, or “tiger-right”.
Note that in this example, the TigerProblem is a POMDP that also contains the agent and the environment as its fields. In general this doesn’t need to be the case. (Refer to more complicated examples.)
- class pomdp_py.problems.tiger.tiger_problem.TigerObservation(name)[source]¶
Bases:
Observation
- class pomdp_py.problems.tiger.tiger_problem.ObservationModel(noise=0.15)[source]¶
Bases:
ObservationModel
- probability(self, observation, next_state, action)[source]¶
Returns the probability of \(\Pr(o|s',a)\).
- Parameters:
observation (Observation) – the observation \(o\)
next_state (State) – the next state \(s'\)
action (Action) – the action \(a\)
- Returns:
the probability \(\Pr(o|s',a)\)
- Return type:
float
- class pomdp_py.problems.tiger.tiger_problem.TransitionModel[source]¶
Bases:
TransitionModel
- probability(next_state, state, action)[source]¶
According to problem spec, the world resets once action is open-left/open-right. Otherwise, stays the same
- class pomdp_py.problems.tiger.tiger_problem.RewardModel[source]¶
Bases:
RewardModel
- class pomdp_py.problems.tiger.tiger_problem.PolicyModel[source]¶
Bases:
RolloutPolicy
A simple policy model with uniform prior over a small, finite action space
- ACTIONS = [TigerAction(open-left), TigerAction(open-right), TigerAction(listen)]¶
- class pomdp_py.problems.tiger.tiger_problem.TigerProblem(obs_noise, init_true_state, init_belief)[source]¶
Bases:
POMDP
In fact, creating a TigerProblem class is entirely optional to simulate and solve POMDPs. But this is just an example of how such a class can be created.
- static create(state='tiger-left', belief=0.5, obs_noise=0.15)[source]¶
- Parameters:
state (str) – could be ‘tiger-left’ or ‘tiger-right’; True state of the environment
belief (float) – Initial belief that the target is on the left; Between 0-1.
obs_noise (float) – Noise for the observation model (default 0.15)
- pomdp_py.problems.tiger.tiger_problem.test_planner(tiger_problem, planner, nsteps=3, debug_tree=False)[source]¶
Runs the action-feedback loop of Tiger problem POMDP
- Parameters:
tiger_problem (TigerProblem) – a problem instance
planner (Planner) – a planner
nsteps (int) – Maximum number of steps to run this loop.
debug_tree (bool) – True if get into the pdb with a TreeDebugger created as ‘dd’ variable.