Abstract
Although AlphaZero has achieved superhuman performance in board games, recent studies reveal its limitations in handling scenarios requiring a comprehensive understanding of the entire board, such as recognizing long-sequence patterns in Go.
To address this challenge, we propose ResTNet, a network that interleaves residual and Transformer blocks to bridge local and global knowledge.
ResTNet improves playing strength across multiple board games, increasing win rate from 54.6% to 60.8% in 9x9 Go, 53.6% to 60.9% in 19x19 Go, and 50.4% to 58.0% in 19x19 Hex.
In addition, ResTNet effectively processes global information and tackles two long-sequence patterns in 19x19 Go, including circular pattern and ladder pattern.
It reduces the mean square error for circular pattern recognition from 2.58 to 1.07 and lowers the attack probability against an adversary program from 70.44% to 23.91%.
ResTNet also improves ladder pattern recognition accuracy from 59.15% to 80.01%.
By visualizing attention maps, we demonstrate that ResTNet captures critical game concepts in both Go and Hex, offering insights into AlphaZero's decision-making process.
Overall, ResTNet shows a promising approach to integrating local and global knowledge, paving the way for more effective AlphaZero-based algorithms in board games.
Our code is available at https://rlg.iis.sinica.edu.tw/papers/restnet.
Challenging Long-range Patterns in Board Games
AlphaZero remains vulnerable in scenarios that demand a comprehensive understanding of the entire board.
For example, in 19x19 Go, AlphaZero's architecture cannot handle two challenging long-range patterns effectively.
Circular patterns:
A long sequence of cycling patterns (marked by black stones) that most AlphaZero program will misunderstand, leading to being captured.
Ladder patterns:
A long sequence of zig-zag patterns (marked by white stones) across the entire board that player needs to simulate to capture or escape a group of stones.
ResTNet
We propose a novel architecture specifically designed for AlphaZero algorithms to effectively bridge both local and global information.
ResTNet consists of a sequence of blocks, each of which is either a residual block (R) or a Transformer block (T).
Experiment Results
Playing Performance of ResTNet
- 9x9 Go (bottom-left table):
We train and evaluate different configurations of 6-block networks, including the following:
- Conv only: AlphaZero architecture with all residual blocks
- ViT-like: vision Transformer with all Transformer blocks
- CoAtNet-like: residual blocks first and followed by Transformer blocks
- Proposed architecture: interleaved residual and Transformer blocks
The result shows that RRTRRT achieves the best performance, with a win rate of 60.8% against KataGo.
- 19x19 Go and 19x19 Hex (bottom-right table):
We extend
RRT to 10-blocks networks, including R3(RRT) and 10R to evaluate the playing performance.
The result shows that R3(RRT) outperforms 10R in both games.
Global Information Ability of ResTNet
We evaluate the global information ability of ResTNet by evaluating their performance on two well-known long-sequence patterns in Go:
circular pattern and
ladder pattern.
For circular patterns, we investigate the ability of ResTNet to defend against the cyclic-adversary.
As the table below shows, R3(RRT) has better performance than 10R in defending cyclic-adversary.
For ladder patterns, we evaluate the performance of predicting whether the defender can escape from the ladder.
The table below shows R3(RRT) outperforms 10R in ladder pattern recognition.
We also examine whether the network can recognize circular pattern through board evaluation.
The Figure below shows an example of the circular pattern (black stones marked as red in Target positions), where the marked stone should be recognized as White's territory.
The results show that
10R fail to recognize the circular pattern, while
R3(RRT) recognize it correctly.
Visualization of ResTNet
We visualize ResTNet by constructing attention maps using the attention values from the Transformer block.
By analyzing these attention maps, we can explore patterns or strategies utilized by ResTNet, offering insights into its behavior and decision-making process.
The figures below show the attention maps for the position marked in green, with redder colors indicating higher levels of relative importance.
Attention Maps in 19x19 Go
In 19×19 Go games, the attention maps demonstrate that the model captures key Go concepts, including life-and-death situations (Alive stones), areas of territorial uncertainty (Uncertain territory), and critical positions in the ladder pattern (Critical positions).
Attention Maps in 19x19 Hex
Similar to 19x19 Go, the attention maps below show essential game concepts in 19x19 Hex, particularly the fundamental strategy of connection.
Model Download