The Value Function Polytope in Reinforcement Learning