Preference-based Action Prior¶
The code below is a minimum example of defining a
PolicyModel
that supports a rollout policy based on preference-based action prior [2].
The action prior is specified through the
ActionPrior
object,
which returns a set of preferred actions given a state (and/or history).
import random
from pomdp_py import RolloutPolicy, ActionPrior
class PolicyModel(RolloutPolicy):
def __init__(self, action_prior=None):
"""
action_prior is an object of type ActionPrior
that implements that get_preferred_actions function.
"""
self.action_prior = action_prior
def sample(self, state):
return random.sample(
self.get_all_actions(state=state), 1)[0]
def get_all_actions(self, state, history=None):
raise NotImplementedError
def rollout(self, state, history=None):
if self.action_prior is not None:
preferences =\
self.action_prior\
.get_preferred_actions(state, history)
if len(preferences) > 0:
return random.sample(preferences, 1)[0][0]
else:
return self.sample(state)
else:
return self.sample(state)
Note that the notion of “action prior” here is narrow; It
follows the original POMCP paper [2].
In general, you could express a prior over the action distribution
explicitly through the sample
and rollout
function in
PolicyModel
. Refer to the Tiger
tutorial for more details (the paragraph on PolicyModel).
As described in [2], you could choose to set an initial visit count and initial value corresponding
to a preferred action; To take this into account during POMDP planning using POUCT or POMCP,
you need to supply the ActionPrior
object
when you initialize the POUCT
or POMCP
objects through the action_prior
argument.