WallZero: Mastering the Game of WallGo with Strategic Analysis

Hsing-Yu Chen1,2, Jérôme Arjonilla2, I-Chen Wu1, Ti-Rong Wu2
1 National Yang Ming Chiao Tung University, 2 Academia Sinica
Paper Play with WallZero Slide Code BibTeX

Abstract

WallGo is a recently introduced strategic board game popularized by the 2025 Netflix series The Devil's Plan. Although played on a small 7 × 7 board, its combination of stone movement and wall placement yields high game-tree complexity and intricate strategic interactions. Despite its growing popularity, WallGo remains underexplored. This paper presents WallZero, an AlphaZero-based agent for the two-player WallGo setting. We introduce tailored action and feature designs to improve playing performance significantly. In the evaluation, WallZero defeats two professional Go players who participated in this study, securing on average 1.98× more territory per game. Beyond its strength, we use WallZero to assess game fairness and identify key strategies for mastering WallGo. Interestingly, our results show that the opening used in the Netflix series yields a more balanced game. Our code is available at rlg.iis.sinica.edu.tw/papers/wallzero.

The Rules of WallGo

WallGo is played on a 7 × 7 board. In the two-player setting considered in this work, Red and Blue each control four stones and seek to enclose more territory than the opponent.

Empty mode initial board
(a)
4-stone mode initial board
(b)
Reachable area of Red stone labeled 4
(c)
Two-step upward move with wall placement
(d)
Endgame where Blue wins by one point
(e)
Overview of WallGo rules. (a) Empty mode. (b) 4-stone mode. (c) Reachable area of Red stone labeled 4 (light green). (d) Two-step upward move with wall placement. (e) Endgame: Blue wins by 1 point (19–18).

Although the board size is small, each turn combines stone movement and wall construction, resulting in an estimated game-tree complexity of approximately \(10^{87}\).

WallZero

WallZero follows the AlphaZero training pipeline in the MiniZero framework, where Monte Carlo Tree Search is guided by a policy-value network. The agent adds two WallGo-specific designs: a unified action space that covers both setup and play phases, and a feature representation tailored to stones, walls, territory, reachability, and recent history.

Feature design in WallZero.
Feature # of planes Description
Stone 2 Red / Blue stone
Horizontal Wall 2 Red / Blue horizontal wall
Vertical Wall 2 Red / Blue vertical wall
Player Turn 2 Indicates the player to move
Territory (T) 3 Red / Blue / Neutral territory
Reachability (R) 2 Positions reachable within one turn for each player
History (H) 36 Four-step history of stones, walls, and territory (9 per step)

Experimental Results

Human-AI Evaluation

To evaluate practical playing strength, WallZero uses a 10-block residual network and is tested against two Taiwanese professional Go players, Wei Huang (3-dan) and Chun-Hsun Chou (9-dan), across both modes and both colors. Before the formal matches, both players played practice games to familiarize themselves with WallGo. The formal matches used a 90-second time limit per move and 2,000 MCTS simulations per move for WallZero.

Game results between professional Go players and WallZero. Scores are reported as Human-WallZero territory counts. Values in parentheses denote the ratio of WallZero's territory to the human's.
Empty Mode 4-Stone Mode
Red Blue Red Blue
3-dan 16-33 (2.06×) 20-29 (1.45×) 14-32 (2.29×) 19-30 (1.58×)
9-dan 12-37 (3.08×) 17-30 (1.76×) 20-29 (1.45×) 12-34 (2.83×)

WallZero won all eight formal games. Across the matches, it gained 1.98× more territory than the human players on average, and its estimated win rate exceeded 90% before move 20 in every evaluated game.

Opening Analysis

The analysis examines whether the 4-stone opening used in The Devil's Plan is aligned with the opening preference of the empty-mode model, and compares the opening distributions learned in empty and 4-stone modes.

When the 4-stone opening is progressively introduced from empty mode, the empty-mode policy does not converge toward the predefined opening. It concentrates probability mass near the center and assigns near-zero probability to the fixed 4-stone locations. Even so, the symmetric empty mode exhibits a stronger first-player advantage, with Red win rate 55.95%, compared with 51.35% in 4-stone mode.

Opening analysis step 0 evaluated by empty mode model
Step 0
55.95%
Opening analysis step 1 evaluated by empty mode model
Step 1
49.6%
Opening analysis step 2 evaluated by empty mode model
Step 2
54.7%
Opening analysis step 3 evaluated by empty mode model
Step 3
52.95%
Opening analysis step 4 evaluated by empty mode model
Step 4
53.4%
4-stone mode initial state evaluated by 4-stone mode model
Step 0
51.35%
Win rate (from Red's perspective) and policy probabilities predicted by the policy and value networks. For the policy, only probabilities larger than 10% are displayed. (a)-(e) use the empty mode model with the 4-stone opening; (f) uses the 4-stone mode model.

Strategic Analysis

A central contribution of this work is the use of WallZero not only as a strong playing agent, but also as an analysis tool for understanding WallGo strategy. The paper analyzes WallZero self-play games in 4-stone mode together with feedback from professional Go players, identifying two core strategies: reachability control and passing strategy.

Reachability refers to the set of positions a player can access under the current board. Since both stones and walls affect reachability, maintaining future access is essential in WallGo. The following cases illustrate how reachability guides movement and wall construction.

After Red moves from E5 to E3, different wall placements lead to distinct future reachability. Placing the wall to the right or below loses two reachable positions, while placing it to the left or above loses one. However, the left placement leaves D3 exclusively reachable by Blue, so WallZero prefers placing the wall above.

Reachability control case I initial position
(a)
Case I wall placed to the right
46%
Case I wall placed below
47%
Case I wall placed to the left
49.5%
Case I wall placed above
52.5%
Reachability Control - Case I. (a) Initial position (Red moves E5 to E3). (b)-(e) Wall placed to the right, below, left, and above, respectively. Percentages denote the win rate from Red's perspective.