Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction RS Sutton, J Modayil, M Delp, T Degris, PM Pilarski, A White, D Precup The 10th International Conference on Autonomous Agents and Multiagent …, 2011 | 520 | 2011 |
RL-Glue: Language-independent software for reinforcement-learning experiments B Tanner, A White The Journal of Machine Learning Research 10, 2133-2136, 2009 | 170 | 2009 |
Multi-timescale nexting in a reinforcement learning robot J Modayil, A White, RS Sutton Adaptive Behavior 22 (2), 146-160, 2014 | 128 | 2014 |
Feature construction for reinforcement learning in hearts NR Sturtevant, AM White Computers and Games: 5th International Conference, CG 2006, Turin, Italy …, 2007 | 75 | 2007 |
Developing a predictive approach to knowledge A White University of Alberta, 2015 | 71 | 2015 |
Report on the 2008 reinforcement learning competition S Whiteson, B Tanner, A White AI Magazine 31 (2), 81-81, 2010 | 58 | 2010 |
Organizing experience: a deeper look at replay mechanisms for sample-based planning in continuous state domains Y Pan, M Zaheer, A White, A Patterson, M White arXiv preprint arXiv:1806.04624, 2018 | 52 | 2018 |
A greedy approach to adapting the trace parameter for temporal difference learning M White, A White arXiv preprint arXiv:1607.00446, 2016 | 41 | 2016 |
Investigating practical linear temporal difference learning A White, M White arXiv preprint arXiv:1602.08771, 2016 | 41 | 2016 |
Gradient temporal-difference learning with regularized corrections S Ghiassian, A Patterson, S Garg, D Gupta, A White, M White International Conference on Machine Learning, 3524-3534, 2020 | 35 | 2020 |
Interval Estimation for Reinforcement-Learning Algorithms in Continuous-State Domains M White, A White Advances in Neural Information Processing Systems, 2010 | 35 | 2010 |
Surprise and curiosity for big data robotics A White, J Modayil, RS Sutton Workshops at the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014 | 33 | 2014 |
General value function networks M Schlegel, A Jacobsen, Z Abbas, A Patterson, A White, M White Journal of Artificial Intelligence Research 70, 497-543, 2021 | 29 | 2021 |
Scaling life-long off-policy learning RSS Adam White, Joseph Modayil 2012 IEEE International Conference on Development and Learning and …, 2013 | 29* | 2013 |
Reinforcement learning benchmarks and bake-offs II A Dutech, T Edmunds, J Kok, M Lagoudakis, M Littman, M Riedmiller, ... Advances in Neural Information Processing Systems (NIPS) 17, 6, 2005 | 28 | 2005 |
Online off-policy prediction S Ghiassian, A Patterson, M White, RS Sutton, A White arXiv preprint arXiv:1811.02597, 2018 | 27 | 2018 |
Adapting behavior via intrinsic reward: A survey and empirical study C Linke, NM Ady, M White, T Degris, A White Journal of artificial intelligence research 69, 1287-1332, 2020 | 26 | 2020 |
Accelerated gradient temporal difference learning Y Pan, A White, M White Proceedings of the AAAI Conference on Artificial Intelligence 31 (1), 2017 | 26 | 2017 |
Improving performance in reinforcement learning by breaking generalization in neural networks S Ghiassian, B Rafiee, YL Lo, A White arXiv preprint arXiv:2003.07417, 2020 | 24 | 2020 |
Planning with expectation models Y Wan, Z Abbas, A White, M White, RS Sutton arXiv preprint arXiv:1904.01191, 2019 | 21 | 2019 |