Preference-based Action Prior

The code below is a minimum example of defining a PolicyModel that supports a rollout policy based on preference-based action prior [2]. The action prior is specified through the ActionPrior object, which returns a set of preferred actions given a state (and/or history).

import random
from pomdp_py import RolloutPolicy, ActionPrior

class PolicyModel(RolloutPolicy):
    def __init__(self, action_prior=None):
        """
        action_prior is an object of type ActionPrior
        that implements that get_preferred_actions function.
        """
        self.action_prior = action_prior

    def sample(self, state):
        return random.sample(
            self.get_all_actions(state=state), 1)[0]

    def get_all_actions(self, state, history=None):
        raise NotImplementedError

    def rollout(self, state, history=None):
        if self.action_prior is not None:
            preferences =\
                self.action_prior\
                    .get_preferred_actions(state, history)
            if len(preferences) > 0:
                return random.sample(preferences, 1)[0][0]
            else:
                return self.sample(state)
        else:
            return self.sample(state)

Note that the notion of “action prior” here is narrow; It follows the original POMCP paper [2]. In general, you could express a prior over the action distribution explicitly through the sample and rollout function in PolicyModel. Refer to the Tiger tutorial for more details (the paragraph on PolicyModel).

As described in [2], you could choose to set an initial visit count and initial value corresponding to a preferred action; To take this into account during POMDP planning using POUCT or POMCP, you need to supply the ActionPrior object when you initialize the POUCT or POMCP objects through the action_prior argument.