Estimating Q(s,s') with Deep Deterministic Dynamics Gradients. (arXiv:2002.09505v1 [cs.LG])

In this paper, we introduce a novel form of value function, $Q(s, s’)$, that
expresses the utility of transitioning from a state $s$ to a neighboring state
$s’$ and then acting optimally thereafter. In order to derive an optimal
policy, we develop a forward dynamics model that learns to make next-state
predictions that maximize this value. This formulation decouples actions from
values while still learning off-policy. We highlight the benefits of this
approach in terms of value function transfer, learning within redundant action
spaces, and learning off-policy from state observations generated by
sub-optimal or completely random policies. Code and videos are available at

Source link

Related posts

We’ve embiggened our line-up of MCubed conference speakers. Join us for a hands-on, no-hype dive into AI


Real-time Analytics News Roundup for Week Ending July 4 – RTInsights


PFML-based Semantic BCI Agent for Game of Go Learning and Prediction. (arXiv:1901.02999v1 [cs.AI])


This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More

Privacy & Cookies Policy


COVID-19 (Coronavirus) is a new illness that is having a major effect on all businesses globally LIVE COVID-19 STATISTICS FOR World