Bridging Local and Global Knowledge via Transformer in Board Games

Yan-Ru Ju1, Tai-Lin Wu1, Chung-Chin Shih1 Ti-Rong Wu1
1 Academia Sinica

Abstract

Although AlphaZero has achieved superhuman performance in board games, recent studies reveal its limitations in handling scenarios requiring a comprehensive understanding of the entire board, such as recognizing long-sequence patterns in Go. To address this challenge, we propose ResTNet, a network that interleaves residual and Transformer blocks to bridge local and global knowledge. ResTNet improves playing strength across multiple board games, increasing win rate from 54.6% to 60.8% in 9x9 Go, 53.6% to 60.9% in 19x19 Go, and 50.4% to 58.0% in 19x19 Hex. In addition, ResTNet effectively processes global information and tackles two long-sequence patterns in 19x19 Go, including circular pattern and ladder pattern. It reduces the mean square error for circular pattern recognition from 2.58 to 1.07 and lowers the attack probability against an adversary program from 70.44% to 23.91%. ResTNet also improves ladder pattern recognition accuracy from 59.15% to 80.01%. By visualizing attention maps, we demonstrate that ResTNet captures critical game concepts in both Go and Hex, offering insights into AlphaZero's decision-making process. Overall, ResTNet shows a promising approach to integrating local and global knowledge, paving the way for more effective AlphaZero-based algorithms in board games. Our code is available at https://rlg.iis.sinica.edu.tw/papers/restnet.

Challenging Long-range Patterns in Board Games

AlphaZero remains vulnerable in scenarios that demand a comprehensive understanding of the entire board. For example, in 19x19 Go, AlphaZero's architecture cannot handle two challenging long-range patterns effectively.
  • Circular patterns: A long sequence of cycling patterns (marked by black stones) that most AlphaZero program will misunderstand, leading to being captured.
  • Ladder patterns: A long sequence of zig-zag patterns (marked by white stones) across the entire board that player needs to simulate to capture or escape a group of stones.
  • Circular pattern
    Ladder pattern

    ResTNet

    assets/restnet_framework.png

    We propose a novel architecture specifically designed for AlphaZero algorithms to effectively bridge both local and global information. ResTNet consists of a sequence of blocks, each of which is either a residual block (R) or a Transformer block (T).

    Experiment Results

    Playing Performance of ResTNet

    Win rate of various 6-block ResTNet models against KataGo.
    assets/6b_battle_table.svg
    Win rate of various 10-block ResTNet models in 19x19 Go and 19x19 Hex.
    assets/10b_compare_table.svg

    Global Information Ability of ResTNet

    We evaluate the global information ability of ResTNet by evaluating their performance on two well-known long-sequence patterns in Go: circular pattern and ladder pattern.
  • For circular patterns, we investigate the ability of ResTNet to defend against the cyclic-adversary. As the table below shows, R3(RRT) has better performance than 10R in defending cyclic-adversary.
  • For ladder patterns, we evaluate the performance of predicting whether the defender can escape from the ladder. The table below shows R3(RRT) outperforms 10R in ladder pattern recognition.
  • ResTNet's global information ability on long-sequence patterns.
    assets/circular_patterns_ladder_patterns_performance.svg
    We also examine whether the network can recognize circular pattern through board evaluation. The Figure below shows an example of the circular pattern (black stones marked as red in Target positions), where the marked stone should be recognized as White's territory. The results show that 10R fail to recognize the circular pattern, while R3(RRT) recognize it correctly.
    Target positions
    Ground truth
    10R
    R3(RRT)

    Visualization of ResTNet

    We visualize ResTNet by constructing attention maps using the attention values from the Transformer block. By analyzing these attention maps, we can explore patterns or strategies utilized by ResTNet, offering insights into its behavior and decision-making process. The figures below show the attention maps for the position marked in green, with redder colors indicating higher levels of relative importance.

    Attention Maps in 19x19 Go

    In 19×19 Go games, the attention maps demonstrate that the model captures key Go concepts, including life-and-death situations (Alive stones), areas of territorial uncertainty (Uncertain territory), and critical positions in the ladder pattern (Critical positions).
    Alive stones
    Uncertain territory
    Critical positions

    Attention Maps in 19x19 Hex

    Similar to 19x19 Go, the attention maps below show essential game concepts in 19x19 Hex, particularly the fundamental strategy of connection.
    Potential paths
    Uncertain territory

    Model Download