WallGo is a recently introduced strategic board game popularized by the 2025 Netflix series The Devil's Plan. Although played on a small 7 × 7 board, its combination of stone movement and wall placement yields high game-tree complexity and intricate strategic interactions. Despite its growing popularity, WallGo remains underexplored. This paper presents WallZero, an AlphaZero-based agent for the two-player WallGo setting. We introduce tailored action and feature designs to improve playing performance significantly. In the evaluation, WallZero defeats two professional Go players who participated in this study, securing on average 1.98× more territory per game. Beyond its strength, we use WallZero to assess game fairness and identify key strategies for mastering WallGo. Interestingly, our results show that the opening used in the Netflix series yields a more balanced game. Our code is available at rlg.iis.sinica.edu.tw/papers/wallzero.
WallGo is played on a 7 × 7 board. In the two-player setting considered in this work, Red and Blue each control four stones and seek to enclose more territory than the opponent.
Although the board size is small, each turn combines stone movement and wall construction, resulting in an estimated game-tree complexity of approximately \(10^{87}\).
WallZero follows the AlphaZero training pipeline in the MiniZero framework, where Monte Carlo Tree Search is guided by a policy-value network. The agent adds two WallGo-specific designs: a unified action space that covers both setup and play phases, and a feature representation tailored to stones, walls, territory, reachability, and recent history.
| Feature | # of planes | Description |
|---|---|---|
| Stone | 2 | Red / Blue stone |
| Horizontal Wall | 2 | Red / Blue horizontal wall |
| Vertical Wall | 2 | Red / Blue vertical wall |
| Player Turn | 2 | Indicates the player to move |
Territory (T) |
3 | Red / Blue / Neutral territory |
Reachability (R) |
2 | Positions reachable within one turn for each player |
History (H) |
36 | Four-step history of stones, walls, and territory (9 per step) |
To evaluate practical playing strength, WallZero uses a 10-block residual network and is tested against two Taiwanese professional Go players, Wei Huang (3-dan) and Chun-Hsun Chou (9-dan), across both modes and both colors. Before the formal matches, both players played practice games to familiarize themselves with WallGo. The formal matches used a 90-second time limit per move and 2,000 MCTS simulations per move for WallZero.
| Empty Mode | 4-Stone Mode | |||
|---|---|---|---|---|
| Red | Blue | Red | Blue | |
| 3-dan | 16-33 (2.06×) | 20-29 (1.45×) | 14-32 (2.29×) | 19-30 (1.58×) |
| 9-dan | 12-37 (3.08×) | 17-30 (1.76×) | 20-29 (1.45×) | 12-34 (2.83×) |
WallZero won all eight formal games. Across the matches, it gained 1.98× more territory than the human players on average, and its estimated win rate exceeded 90% before move 20 in every evaluated game.
The analysis examines whether the 4-stone opening used in The Devil's Plan is aligned with the opening preference of the empty-mode model, and compares the opening distributions learned in empty and 4-stone modes.
The most frequent openings show different behavior across the two modes. Empty-mode openings are diverse but consistently center-oriented, with both players clustering around the opponent's stones to restrict reachability. In 4-stone mode, openings converge to a smaller set of patterns: Red first consolidates regional influence, Blue counters with two consecutive placements, and Red's final placement maintains containment over Blue's stones.
When the 4-stone opening is progressively introduced from empty mode, the empty-mode policy does not converge toward the predefined opening. It concentrates probability mass near the center and assigns near-zero probability to the fixed 4-stone locations. Even so, the symmetric empty mode exhibits a stronger first-player advantage, with Red win rate 55.95%, compared with 51.35% in 4-stone mode.
A central contribution of this work is the use of WallZero not only as a strong playing agent, but also as an analysis tool for understanding WallGo strategy. The paper analyzes WallZero self-play games in 4-stone mode together with feedback from professional Go players, identifying two core strategies: reachability control and passing strategy.
Reachability refers to the set of positions a player can access under the current board. Since both stones and walls affect reachability, maintaining future access is essential in WallGo. The following cases illustrate how reachability guides movement and wall construction.
After Red moves from E5 to E3, different wall placements lead to distinct future reachability. Placing the wall to the right or below loses two reachable positions, while placing it to the left or above loses one. However, the left placement leaves D3 exclusively reachable by Blue, so WallZero prefers placing the wall above.
In this midgame position, Red at E2 and Blue at E1 compete for the lower region. Placing the wall above preserves Red's reachability while limiting Blue's expansion. The Red stone at D1 itself acts as a temporary barrier, showing that stones can serve as implicit walls while preserving flexibility.
Case III shows that sacrificing immediate territory can be correct when it preserves access to a more important future contest. Moving to E7 allows Red to gain one additional point, but keeps Blue closer to the central battle around B4: reaching C5 takes three steps from F7 but only two from E7. This changes the eventual outcome by one point.
Because each turn requires both movement and wall placement, entering a valuable region first can be disadvantageous. In the analyzed late-game position, Blue leads by one point and the lower-right region has 10 unclaimed points. If Blue plays there first, it can obtain at most 3 points and loses; if Red plays there first, both players secure 5 points and Blue wins. WallZero reveals an implicit passing technique in which players sacrifice small amounts of territory to manipulate move order in the endgame.