Спикер о вебинаре:
I will start with a general introduction to online learning and then present the Tsallis-INF algorithm, which achieves the optimal (within constants) pseudo-regret in both stochastic and adversarial multi-armed bandits without prior knowledge of the regime and time horizon. It also achieves the optimal regret guarantee in several intermediate regimes, including stochastically constrained adversarial bandits and stochastic bandits with adversarial corruptions. We provide empirical evaluation of the algorithm, demonstrating that it significantly outperforms UCB1 and EXP3 in stochastic environments. We also provide examples of adversarial environments, where UCB1 and Thompson Sampling exhibit almost linear regret, whereas Tsallis-INF suffers only logarithmic regret.
If time permits, I will also survey several follow-up works, including an optimal algorithm for adversarial bandits with arbitrary delays, and an algorithm for stochastic and adversarial bandits with switching costs.
Материалы:
The presentation is based on the following papers:
Julian Zimmert and Yevgeny Seldin. Tsallis-INF: An optimal algorithm for stochastic and adversarial bandits. Journal of Machine Learning Research, 2021.
Saeed Masoudian and Yevgeny Seldin. Improved analysis of robustness of the Tsallis-INF algorithm to adversarial corruptions in stochastic multiarmed bandits. COLT, 2021.
Julian Zimmert and Yevgeny Seldin. An optimal algorithm for adversarial bandits with arbitrary delays. AISTATS, 2020.
Chloé Rouyer, Yevgeny Seldin, and Nicolò Cesa-Bianchi. An algorithm for stochastic and adversarial bandits with switching costs. ICML, 2021.
Chloé Rouyer and Yevgeny Seldin. Tsallis-INF for Decoupled Exploration and Exploitation in Multi-armed Bandits. COLT, 2020.
Язык выступления: английский.
Запись: https://youtu.be/xReJk044U-U
Презентация: https://drive.google.com/file/d/1sDZ-EGBzk6Xuy1y2XZS8nZ6o8R-lujpf/view?usp=sharing