Natural actor–critic algorithms S Bhatnagar, RS Sutton, M Ghavamzadeh, M Lee Automatica 45 (11), 2471-2482, 2009 | 585 | 2009 |
Bayesian reinforcement learning: A survey M Ghavamzadeh, S Mannor, J Pineau, A Tamar arXiv preprint arXiv:1609.04436, 2016 | 221 | 2016 |
Best arm identification: A unified approach to fixed budget and fixed confidence V Gabillon, M Ghavamzadeh, A Lazaric Advances in Neural Information Processing Systems 25, 3212-3220, 2012 | 197 | 2012 |
High-confidence off-policy evaluation P Thomas, G Theocharous, M Ghavamzadeh Proceedings of the AAAI Conference on Artificial Intelligence 29 (1), 2015 | 158 | 2015 |
Regularized policy iteration A Farahmand, M Ghavamzadeh, S Mannor, C Szepesvári Advances in Neural Information Processing Systems 21, 441-448, 2008 | 154 | 2008 |
Hierarchical multi-agent reinforcement learning R Makar, S Mahadevan, M Ghavamzadeh Proceedings of the fifth international conference on Autonomous agents, 246-253, 2001 | 149 | 2001 |
Hierarchical multi-agent reinforcement learning M Ghavamzadeh, S Mahadevan, R Makar Autonomous Agents and Multi-Agent Systems 13 (2), 197-229, 2006 | 138 | 2006 |
Supervised actor-critic reinforcement learning MT Rosenstein, AG Barto, J Si, A Barto, W Powell Learning and Approximate Dynamic Programming: Scaling Up to the Real World …, 2004 | 135 | 2004 |
A lyapunov-based approach to safe reinforcement learning Y Chow, O Nachum, E Duenez-Guzman, M Ghavamzadeh Advances in neural information processing systems, 8092-8101, 2018 | 125 | 2018 |
High confidence policy improvement P Thomas, G Theocharous, M Ghavamzadeh International Conference on Machine Learning, 2380-2388, 2015 | 117 | 2015 |
Finite-Sample Analysis of Proximal Gradient TD Algorithms. B Liu, J Liu, M Ghavamzadeh, S Mahadevan, M Petrik UAI, 504-513, 2015 | 110 | 2015 |
Risk-constrained reinforcement learning with percentile risk criteria Y Chow, M Ghavamzadeh, L Janson, M Pavone The Journal of Machine Learning Research 18 (1), 6070-6120, 2017 | 105 | 2017 |
Speedy Q-learning MG Azar, R Munos, M Ghavamzadaeh, HJ Kappen Spain, Granada: NIPS, 2011 | 95 | 2011 |
More robust doubly robust off-policy evaluation M Farajtabar, Y Chow, M Ghavamzadeh arXiv preprint arXiv:1802.03493, 2018 | 94 | 2018 |
Bayesian multi-task reinforcement learning A Lazaric, M Ghavamzadeh | 92 | 2010 |
Multi-bandit best arm identification V Gabillon, M Ghavamzadeh, A Lazaric, S Bubeck Advances in Neural Information Processing Systems, 2222-2230, 2011 | 91 | 2011 |
Finite-sample analysis of least-squares policy iteration A Lazaric, M Ghavamzadeh, R Munos The Journal of Machine Learning Research 13 (1), 3041-3074, 2012 | 87 | 2012 |
Bayesian policy gradient algorithms M Ghavamzadeh, Y Engel Advances in neural information processing systems, 457-464, 2007 | 87 | 2007 |
Personalized ad recommendation systems for life-time value optimization with guarantees G Theocharous, PS Thomas, M Ghavamzadeh Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015 | 84 | 2015 |
Analysis of a classification-based policy iteration algorithm A Lazaric, M Ghavamzadeh, R Munos | 84 | 2010 |