Multiagent reinforcement learning (MARL) is commonly considered to suffer
from non-stationary environments and exponentially increasing policy space. It
would be even more challenging when rewards are sparse and delayed over long
trajectories. In this paper, we study hierarchical deep MARL in cooperative
multiagent problems with sparse and delayed reward. With temporal abstraction,
we decompose the problem into a hierarchy of different time scales and
investigate how agents can learn high-level coordination based on the
independent skills learned at the low level. Three hierarchical deep MARL
architectures are proposed to learn hierarchical policies under different MARL
paradigms. Besides, we propose a new experience replay mechanism to alleviate
the issue of the sparse transitions at the high level of abstraction and the
non-stationarity of multiagent learning. We empirically demonstrate the
effectiveness of our approaches in two domains with extremely sparse feedback:
(1) a variety of Multiagent Trash Collection tasks, and (2) a challenging
online mobile game, i.e., Fever Basketball Defense.

Source link