Reinforcement Learning and Games Lab • Institute of Information Science • Academia Sinica
This paper presents WallZero, an AlphaZero-based agent for WallGo, a board game popularized by the Netflix series The Devil's Plan. WallZero defeats professional Go players and is further used to assess game fairness and identify key strategies for mastering WallGo.
This paper proposes regret-guided search control, extending AlphaZero with regret-guided restarts that yield more efficient and robust learning in board games.
This paper analyzes the behavior of solving Life-and-Death (L&D) problems in the game of Go using current state-of-the-art computer Go solvers.
This paper aims to learn human-like RL by distilling demonstrations into macro actions and optimizing trajectories for both reward and human-likeness, achieving top human-likeness on D4RL Adroit.
This paper proposes ResTNet, an AlphaZero backbone that interleaves residual and Transformer blocks to fuse local and global board knowledge, improving strength on Go/Hex and better recognizing long-sequence patterns.
This paper interprets the MuZero planning using the learned latent states, analyzing across two board games: 9x9 Go and Gomoku, and three Atari games: Breakout, Ms. Pacman, and Pong.
This paper proposes a strength system that can estimate the strength from games and provide various playing strengths while simultaneously offer a human-like behavior in both Go and chess.
This paper presents OptionZero, a method that integrates options into the MuZero algorithm, which autonomously discovers options through self-play games and utilizes options during planning.
This paper presents MiniZero, a zero-knowledge learning framework that supports four state-of-the-art algorithms, including AlphaZero, MuZero, Gumbel AlphaZero, and Gumbel MuZero.
This paper proposes methods for game solving with online fine-tuning; namely, while solving, we simultaneously use an online fine-tuning trainer to fine-tune heuristics to provide higher accurate evaluations.