All of sudden, Sergey Levin’s paper Control as Inference makes a lot of sense to me. Reinforcement learning, in its essense, is estimation. The difference is that the true value spans cross time. So the inferencing is done with time.