Preference-based Action Prior ***************************** The code below is a minimum example of defining a :py:mod:`~pomdp_py.framework.basics.PolicyModel` that supports a rollout policy based on preference-based action prior :cite:`silver2010monte`. The action prior is specified through the :py:mod:`~pomdp_py.algorithms.po_uct.ActionPrior` object, which returns a set of preferred actions given a state (and/or history). .. code-block:: python import random from pomdp_py import RolloutPolicy, ActionPrior class PolicyModel(RolloutPolicy): def __init__(self, action_prior=None): """ action_prior is an object of type ActionPrior that implements that get_preferred_actions function. """ self.action_prior = action_prior def sample(self, state): return random.sample( self.get_all_actions(state=state), 1)[0] def get_all_actions(self, state, history=None): raise NotImplementedError def rollout(self, state, history=None): if self.action_prior is not None: preferences =\ self.action_prior\ .get_preferred_actions(state, history) if len(preferences) > 0: return random.sample(preferences, 1)[0][0] else: return self.sample(state) else: return self.sample(state) Note that the notion of "action prior" here is narrow; It follows the original POMCP paper :cite:`silver2010monte`. In general, you could express a prior over the action distribution explicitly through the :code:`sample` and :code:`rollout` function in :py:mod:`~pomdp_py.framework.basics.PolicyModel`. Refer to the `Tiger `_ tutorial for more details (the paragraph on PolicyModel). As described in :cite:`silver2010monte`, you could choose to set an initial visit count and initial value corresponding to a preferred action; To take this into account during POMDP planning using POUCT or POMCP, you need to supply the :py:mod:`~pomdp_py.algorithms.po_uct.ActionPrior` object when you initialize the :py:mod:`~pomdp_py.algorithms.po_uct.POUCT` or :py:mod:`~pomdp_py.algorithms.pomcp.POMCP` objects through the :code:`action_prior` argument.