Integration of Imitation Learning using GAIL and Reinforcement Learning using Task-achievement Rewards via Probabilistic Generative Model. (arXiv:1907.02140v1 [cs.LG])

Integration of reinforcement learning and imitation learning is an important
problem that has been studied for a long time in the field of intelligent
robotics. Reinforcement learning optimizes policies to maximize the cumulative
reward, whereas imitation learning attempts to extract general knowledge about
the trajectories demonstrated by experts, i.e., demonstrators. Because each of
them has their own drawbacks, methods combining them and compensating for each
set of drawbacks have been explored thus far. However, many of the methods are
heuristic and do not have a solid theoretical basis. In this paper, we present
a new theory for integrating reinforcement and imitation learning by extending
the probabilistic generative model framework for reinforcement learning, {it
plan by inference}. We develop a new probabilistic graphical model for
reinforcement learning with multiple types of rewards and a probabilistic
graphical model for Markov decision processes with multiple optimality
emissions (pMDP-MO). Furthermore, we demonstrate that the integrated learning
method of reinforcement learning and imitation learning can be formulated as a
probabilistic inference of policies on pMDP-MO by considering the output of the
discriminator in generative adversarial imitation learning as an additional
optimal emission observation. We adapt the generative adversarial imitation
learning and task-achievement reward to our proposed framework, achieving
significantly better performance than agents trained with reinforcement
learning or imitation learning alone. Experiments demonstrate that our
framework successfully integrates imitation and reinforcement learning even
when the number of demonstrators is only a few.

Source link

Related posts

Program Synthesis and Semantic Parsing with Learned Code Idioms. (arXiv:1906.10816v1 [cs.LG])


Hunting for Tractable Languages for Judgment Aggregation. (arXiv:1808.03043v1 [cs.AI])


Electromechanical and robot-assisted arm training for improving activities of daily living, arm function, and arm muscle strength after stroke.


This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More

Privacy & Cookies Policy