Learn what not to learn: Action elimination with deep reinforcement learning T Zahavy, M Haroush, N Merlis, DJ Mankowitz, S Mannor arXiv preprint arXiv:1809.02121, 2018 | 234 | 2018 |
Tight regret bounds for model-based reinforcement learning with greedy policies Y Efroni, N Merlis, M Ghavamzadeh, S Mannor Advances in Neural Information Processing Systems 32, 2019 | 76 | 2019 |
Ensemble bootstrapping for q-learning O Peer, C Tessler, N Merlis, R Meir International Conference on Machine Learning, 8454-8463, 2021 | 34 | 2021 |
Reinforcement learning with trajectory feedback Y Efroni, N Merlis, S Mannor Proceedings of the AAAI conference on artificial intelligence 35 (8), 7288-7295, 2021 | 33 | 2021 |
Batch-size independent regret bounds for the combinatorial multi-armed bandit problem N Merlis, S Mannor Conference on Learning Theory, 2465-2489, 2019 | 23 | 2019 |
Tight lower bounds for combinatorial multi-armed bandits N Merlis, S Mannor Conference on Learning Theory, 2830-2857, 2020 | 16 | 2020 |
Confidence-budget matching for sequential budgeted learning Y Efroni, N Merlis, A Saha, S Mannor International Conference on Machine Learning, 2937-2947, 2021 | 8 | 2021 |
Lenient regret for multi-armed bandits N Merlis, S Mannor Proceedings of the AAAI Conference on Artificial Intelligence 35 (10), 8950-8957, 2021 | 6 | 2021 |
Never Worse, Mostly Better: Stable Policy Improvement in Deep Reinforcement Learning P Khanna, G Tennenholtz, N Merlis, S Mannor, C Tessler arXiv preprint arXiv:1910.01062, 2019 | 4* | 2019 |
Reinforcement Learning with History Dependent Dynamic Contexts G Tennenholtz, N Merlis, L Shani, M Mladenov, C Boutilier International Conference on Machine Learning, 34011-34053, 2023 | 3 | 2023 |
Reinforcement learning with a terminator G Tennenholtz, N Merlis, L Shani, S Mannor, U Shalit, G Chechik, ... Advances in Neural Information Processing Systems 35, 35696-35709, 2022 | 2 | 2022 |
On preemption and learning in stochastic scheduling N Merlis, H Richard, F Sentenac, C Odic, M Molina, V Perchet International Conference on Machine Learning, 24478-24516, 2023 | 1 | 2023 |
Multi-armed bandits with guaranteed revenue per arm D Baudry, N Merlis, MB Molina, H Richard, V Perchet International Conference on Artificial Intelligence and Statistics, 379-387, 2024 | | 2024 |
The Value of Reward Lookahead in Reinforcement Learning N Merlis, D Baudry, V Perchet arXiv preprint arXiv:2403.11637, 2024 | | 2024 |
Ranking with Popularity Bias: User Welfare under Self-Amplification Dynamics G Tennenholtz, M Mladenov, N Merlis, C Boutilier arXiv preprint arXiv:2305.18333, 2023 | | 2023 |
Query-Reward Tradeoffs in Multi-Armed Bandits N Merlis, Y Efroni, S Mannor arXiv preprint arXiv:2110.05724, 2021 | | 2021 |