WallZero: Mastering the Game of WallGo with Strategic Analysis

Abstract

WallGo is a recently introduced strategic board game popularized by the 2025 Netflix series The Devil's Plan. Although played on a small 7 × 7 board, its combination of stone movement and wall placement yields high game-tree complexity and intricate strategic interactions. Despite its growing popularity, WallGo remains underexplored. This paper presents WallZero, an AlphaZero-based agent for the two-player WallGo setting. We introduce tailored action and feature designs to improve playing performance significantly. In the evaluation, WallZero defeats two professional Go players who participated in this study, securing on average 1.98× more territory per game. Beyond its strength, we use WallZero to assess game fairness and identify key strategies for mastering WallGo. Interestingly, our results show that the opening used in the Netflix series yields a more balanced game. Our code is available at rlg.iis.sinica.edu.tw/papers/wallzero.

The Rules of WallGo

WallGo is played on a 7 × 7 board. In the two-player setting considered in this work, Red and Blue each control four stones and seek to enclose more territory than the opponent.

Setup phase: In empty mode, all eight stones are placed during setup. In 4-stone mode, four stones are pre-positioned and each player places the remaining two stones. In both modes, Red places one stone first, then players alternately place two consecutive stones, starting with Blue.
Play phase: On each turn, a player selects one stone, moves it by zero, one, or two orthogonal steps, and then places one wall adjacent to the stone's final position.
Scoring: A region is assigned to a player only if it contains that player's stones and no opponent stones. Neutral enclosed regions do not count for either side.
Game end: The game ends when all stones are enclosed within regions containing stones of only one player. The player with the larger total territory wins. If totals are equal, the player with the largest single region wins; otherwise, the game is a draw.

Although the board size is small, each turn combines stone movement and wall construction, resulting in an estimated game-tree complexity of approximately \(10^{87}\).

WallZero

WallZero follows the AlphaZero training pipeline in the MiniZero framework, where Monte Carlo Tree Search is guided by a policy-value network. The agent adds two WallGo-specific designs: a unified action space that covers both setup and play phases, and a feature representation tailored to stones, walls, territory, reachability, and recent history.

Feature design in WallZero.

Feature	# of planes	Description
Stone	2	Red / Blue stone
Horizontal Wall	2	Red / Blue horizontal wall
Vertical Wall	2	Red / Blue vertical wall
Player Turn	2	Indicates the player to move
Territory (`T`)	3	Red / Blue / Neutral territory
Reachability (`R`)	2	Positions reachable within one turn for each player
History (`H`)	36	Four-step history of stones, walls, and territory (9 per step)

Experimental Results

Human-AI Evaluation

To evaluate practical playing strength, WallZero uses a 10-block residual network and is tested against two Taiwanese professional Go players, Wei Huang (3-dan) and Chun-Hsun Chou (9-dan), across both modes and both colors. Before the formal matches, both players played practice games to familiarize themselves with WallGo. The formal matches used a 90-second time limit per move and 2,000 MCTS simulations per move for WallZero.

Game results between professional Go players and WallZero. Scores are reported as Human-WallZero territory counts. Values in parentheses denote the ratio of WallZero's territory to the human's.

	Empty Mode		4-Stone Mode
	Red	Blue	Red	Blue
3-dan	16-33 (2.06×)	20-29 (1.45×)	14-32 (2.29×)	19-30 (1.58×)
9-dan	12-37 (3.08×)	17-30 (1.76×)	20-29 (1.45×)	12-34 (2.83×)

WallZero won all eight formal games. Across the matches, it gained 1.98× more territory than the human players on average, and its estimated win rate exceeded 90% before move 20 in every evaluated game.

Opening Analysis

The analysis examines whether the 4-stone opening used in The Devil's Plan is aligned with the opening preference of the empty-mode model, and compares the opening distributions learned in empty and 4-stone modes.

When the 4-stone opening is progressively introduced from empty mode, the empty-mode policy does not converge toward the predefined opening. It concentrates probability mass near the center and assigns near-zero probability to the fixed 4-stone locations. Even so, the symmetric empty mode exhibits a stronger first-player advantage, with Red win rate 55.95%, compared with 51.35% in 4-stone mode.

Opening analysis step 0 evaluated by empty mode model — Step 0
55.95%

Opening analysis step 1 evaluated by empty mode model — Step 0
55.95%

Strategic Analysis

A central contribution of this work is the use of WallZero not only as a strong playing agent, but also as an analysis tool for understanding WallGo strategy. The paper analyzes WallZero self-play games in 4-stone mode together with feedback from professional Go players, identifying two core strategies: reachability control and passing strategy.

Reachability refers to the set of positions a player can access under the current board. Since both stones and walls affect reachability, maintaining future access is essential in WallGo. The following cases illustrate how reachability guides movement and wall construction.

After Red moves from E5 to E3, different wall placements lead to distinct future reachability. Placing the wall to the right or below loses two reachable positions, while placing it to the left or above loses one. However, the left placement leaves D3 exclusively reachable by Blue, so WallZero prefers placing the wall above.

Reachability control case I initial position — (a)

Case III shows that sacrificing immediate territory can be correct when it preserves access to a more important future contest. Moving to E7 allows Red to gain one additional point, but keeps Blue closer to the central battle around B4: reaching C5 takes three steps from F7 but only two from E7. This changes the eventual outcome by one point.

Reachability control case III initial position — (a)