Reinforcement Learning and Games Lab • Institute of Information Science • Academia Sinica
This paper proposes regret-guided search control, extending AlphaZero with regret-guided restarts that yield more efficient and robust learning in board games.
This paper analyzes the behavior of solving Life-and-Death (L&D) problems in the game of Go using current state-of-the-art computer Go solvers.
This paper aims to learn human-like RL by distilling demonstrations into macro actions and optimizing trajectories for both reward and human-likeness, achieving top human-likeness on D4RL Adroit.
This paper proposes ResTNet, an AlphaZero backbone that interleaves residual and Transformer blocks to fuse local and global board knowledge, improving strength on Go/Hex and better recognizing long-sequence patterns.
This paper interprets the MuZero planning using the learned latent states, analyzing across two board games: 9x9 Go and Gomoku, and three Atari games: Breakout, Ms. Pacman, and Pong.
This paper proposes a strength system that can estimate the strength from games and provide various playing strengths while simultaneously offer a human-like behavior in both Go and chess.
This paper presents OptionZero, a method that integrates options into the MuZero algorithm, which autonomously discovers options through self-play games and utilizes options during planning.
This paper presents MiniZero, a zero-knowledge learning framework that supports four state-of-the-art algorithms, including AlphaZero, MuZero, Gumbel AlphaZero, and Gumbel MuZero.
This paper proposes methods for game solving with online fine-tuning; namely, while solving, we simultaneously use an online fine-tuning trainer to fine-tune heuristics to provide higher accurate evaluations.