The Markov assumption is pervasive in reinforcement learning. By modeling problems as Markov decision processes, agents act as though they can always observe the complete state of the world. While this assumption is sometimes a useful fiction, in general decision processes, agents must find ways to cope with only partial information. Classical techniques for partial observability typically require access to unobservable or hard-to-acquire information (like the complete set of possible world states, or knowledge of mutually exclusive potential futures). Meanwhile, modern recurrent neural networks, which rely only on observables and simple forms of memory, have proven remarkably effective in practice, but lack a principled theoretical framework for understanding when and what agents should remember. And yet---despite its flaws---the Markov assumption may offer a path towards precisely this type of understanding. We show that estimating the value of the agent's policy both with and without the Markov assumption leads to a value discrepancy in non-Markov environments that appears to reliably indicate when memory is useful. We present initial progress towards a theory of such value discrepancies, and sketch an algorithm for automatically learning memory functions by uncovering and subsequently minimizing those discrepancies. Our approach suggests that agents can make effective decisions in general decision processes as long as they remember whatever information is necessary for them to trust their value function estimates.