Reinforcement Learning and Games Lab • Institute of Information Science • Academia Sinica
This paper aims to learn human-like RL by distilling demonstrations into macro actions and optimizing trajectories for both reward and human-likeness, achieving top human-likeness on D4RL Adroit.
This paper proposes ResTNet, an AlphaZero backbone that interleaves residual and Transformer blocks to fuse local and global board knowledge, improving strength on Go/Hex and better recognizing long-sequence patterns.
This paper interprets the MuZero planning using the learned latent states, analyzing across two board games: 9x9 Go and Gomoku, and three Atari games: Breakout, Ms. Pacman, and Pong.
This paper proposes a strength system that can estimate the strength from games and provide various playing strengths while simultaneously offer a human-like behavior in both Go and chess.
This paper presents OptionZero, a method that integrates options into the MuZero algorithm, which autonomously discovers options through self-play games and utilizes options during planning.
This paper presents MiniZero, a zero-knowledge learning framework that supports four state-of-the-art algorithms, including AlphaZero, MuZero, Gumbel AlphaZero, and Gumbel MuZero.
This paper proposes methods for game solving with online fine-tuning; namely, while solving, we simultaneously use an online fine-tuning trainer to fine-tune heuristics to provide higher accurate evaluations.