Deep learning-based channel estimation algorithm over time selective fading channels Q Bai, J Wang, Y Zhang, J Song IEEE Transactions on Cognitive Communications and Networking 6 (1), 125-134, 2019 | 149 | 2019 |
Achieving zero constraint violation for constrained reinforcement learning via primal-dual approach Q Bai, AS Bedi, M Agarwal, A Koppel, V Aggarwal Proceedings of the AAAI Conference on Artificial Intelligence 36 (4), 3682-3689, 2022 | 71 | 2022 |
Reinforcement learning for constrained markov decision processes A Gattami, Q Bai, V Aggarwal International Conference on Artificial Intelligence and Statistics, 2656-2664, 2021 | 31 | 2021 |
Achieving zero constraint violation for constrained reinforcement learning via conservative natural policy gradient primal-dual algorithm Q Bai, AS Bedi, V Aggarwal Proceedings of the AAAI Conference on Artificial Intelligence 37 (6), 6737-6744, 2023 | 20 | 2023 |
Provably efficient model-free algorithm for MDPs with peak constraints Q Bai, V Aggarwal, A Gattami arXiv preprint arXiv:2003.05555, 2020 | 18* | 2020 |
Regret guarantees for model-based reinforcement learning with long-term average constraints M Agarwal, Q Bai, V Aggarwal Uncertainty in Artificial Intelligence, 22-31, 2022 | 17 | 2022 |
A reinforcement learning framework for vehicular network routing under peak and average constraints N Geng, Q Bai, C Liu, T Lan, V Aggarwal, Y Yang, M Xu IEEE Transactions on Vehicular Technology 72 (5), 6753-6764, 2023 | 14 | 2023 |
Concave utility reinforcement learning with zero-constraint violations M Agarwal, Q Bai, V Aggarwal arXiv preprint arXiv:2109.05439, 2021 | 14 | 2021 |
Reinforcement learning for multi-objective and constrained Markov decision processes A Gattami, Q Bai, V Agarwal arXiv preprint arXiv:1901.08978, 2019 | 14 | 2019 |
Regret analysis of policy gradient algorithm for infinite horizon average reward markov decision processes Q Bai, WU Mondal, V Aggarwal Proceedings of the AAAI Conference on Artificial Intelligence 38 (10), 10980 …, 2024 | 12 | 2024 |
Joint optimization of multi-objective reinforcement learning with policy gradient based algorithm Q Bai, M Agarwal, V Aggarwal arXiv preprint arXiv:2105.14125, 2021 | 10 | 2021 |
Escaping saddle points for zeroth-order non-convex optimization using estimated gradient descent Q Bai, M Agarwal, V Aggarwal 2020 54th Annual Conference on Information Sciences and Systems (CISS), 1-6, 2020 | 8 | 2020 |
Achieving zero constraint violation for concave utility constrained reinforcement learning via primal-dual approach Q Bai, AS Bedi, M Agarwal, A Koppel, V Aggarwal Journal of Artificial Intelligence Research 78, 975-1016, 2023 | 7 | 2023 |
Markov decision processes with long-term average constraints M Agarwal, Q Bai, V Aggarwal arXiv preprint arXiv:2106.06680, 2021 | 7 | 2021 |
Provably sample-efficient model-free algorithm for mdps with peak constraints Q Bai, V Aggarwal, A Gattami Journal of Machine Learning Research 24 (60), 1-25, 2023 | 5 | 2023 |
Joint optimization of concave scalarized multi-objective reinforcement learning with policy gradient based algorithm Q Bai, M Agarwal, V Aggarwal Journal of Artificial Intelligence Research 74, 1565-1597, 2022 | 4 | 2022 |
Achieving zero constraint violation for constrained reinforcement learning via conservative natural policy gradient primal-dual algorithm Q Bai, AS Bedi, V Aggarwal arXiv preprint arXiv:2206.05850, 2022 | 3 | 2022 |
Model-free algorithm and regret analysis for MDPs with long-term constraints Q Bai, V Aggarwal, A Gattami arXiv preprint arXiv:2006.05961, 2020 | 1 | 2020 |
Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms V Aggarwal, WU Mondal, Q Bai arXiv preprint arXiv:2406.11481, 2024 | | 2024 |
Learning General Parameterized Policies for Infinite Horizon Average Reward Constrained MDPs via Primal-Dual Policy Gradient Algorithm Q Bai, WU Mondal, V Aggarwal arXiv preprint arXiv:2402.02042, 2024 | | 2024 |