Research Publications • Prof. Ti-Rong Wu

Reinforcement Learning and Games Lab • Institute of Information Science • Academia Sinica

Human behavior distillation in MAQ
NeurIPS 2025

Learning Human-Like RL Agents Through Trajectory Optimization With Action Quantization

This paper aims to learn human-like RL by distilling demonstrations into macro actions and optimizing trajectories for both reward and human-likeness, achieving top human-likeness on D4RL Adroit.

ResTNet Architecture
IJCAI 2025

Bridging Local and Global Knowledge via Transformer in Board Games

This paper proposes ResTNet, an AlphaZero backbone that interleaves residual and Transformer blocks to fuse local and global board knowledge, improving strength on Go/Hex and better recognizing long-sequence patterns.

MCTS Tree for Gomoku
IEEE TAI 2025

Demystifying MuZero Planning: Interpreting the Learned Model

This paper interprets the MuZero planning using the learned latent states, analyzing across two board games: 9x9 Go and Gomoku, and three Atari games: Breakout, Ms. Pacman, and Pong.

Strength Estimation and Strength Adjustment
ICLR 2025

Strength Estimation and Human-Like Strength Adjustment in Games

This paper proposes a strength system that can estimate the strength from games and provide various playing strengths while simultaneously offer a human-like behavior in both Go and chess.

MCTS in OptionZero
ICLR 2025

OptionZero: Planning with Learned Options

This paper presents OptionZero, a method that integrates options into the MuZero algorithm, which autonomously discovers options through self-play games and utilizes options during planning.

MiniZero Architecture
IEEE ToG 2025

MiniZero: Comparative Analysis of AlphaZero and MuZero on Go, Othello, and Atari Games

This paper presents MiniZero, a zero-knowledge learning framework that supports four state-of-the-art algorithms, including AlphaZero, MuZero, Gumbel AlphaZero, and Gumbel MuZero.

Online Fine-Tuning Solver Architecture
NeurIPS 2023

Game Solving with Online Fine-Tuning

This paper proposes methods for game solving with online fine-tuning; namely, while solving, we simultaneously use an online fine-tuning trainer to fine-tune heuristics to provide higher accurate evaluations.