One of the main challenges in reinforcement learning is solving tasks with
sparse reward. We show that the difficulty of discovering a distant rewarding
state in an MDP is bounded by the expected cover time of a random walk over the
graph induced by the MDP’s transition dynamics. We therefore propose to
accelerate exploration by constructing options that minimize cover time. The
proposed algorithm finds an option which provably diminishes the expected
number of steps to visit every state in the state space by a uniform random
walk. We show empirically that the proposed algorithm improves the learning
time in several domains with sparse rewards.

Source link