The development of a successful RL policy rests on the ability to derive informative states from observations and explore alternative strategies from time to time. In many real-world scenarios, such as healthcare, these observations are noisy, irregular, and may not convey all salient information to form a decision. Additionally current state-of-the-art RL algorithms, when faced with partial information and the inability to proactively experiment or explore within their environment, fail to reliably learn optimal policies. With limited data in such settings, determining an optimal policy is intractable. However, recorded negative outcomes can still be useful to identify behaviors that should be avoided. In this talk, I will highlight specific modeling decisions that can be made to develop actionable insights from sequentially observed healthcare data to facilitate the avoidance of suboptimal decisions in patient care. These modeling choices honor underlying data generation as well as the processes by which clinical experts use to formulate their own decisions.