Strength Estimation and Human-Like Strength Adjustment in Games

Abstract

Strength estimation and adjustment are crucial in designing human-AI interactions, particularly in games where AI surpasses human players. This paper introduces a novel strength system, including a strength estimator (SE) and an SE-based Monte Carlo tree search, denoted as SE-MCTS, which predicts strengths from games and offers different playing strengths with human styles. The strength estimator calculates strength scores and predicts ranks from games without direct human interaction. SE-MCTS utilizes the strength scores in a Monte Carlo tree search to adjust playing strength and style. We first conduct experiments in Go, a challenging board game with a wide range of ranks. Our strength estimator significantly achieves over 80% accuracy in predicting ranks by observing 15 games only, whereas the previous method reached 49% accuracy for 100 games. For strength adjustment, SE-MCTS successfully adjusts to designated ranks while achieving a 51.33% accuracy in aligning to human actions, outperforming a previous state-of-the-art, with only 42.56% accuracy. To demonstrate the generality of our strength system, we further apply SE and SE-MCTS to chess and obtain consistent results. These results show a promising approach to strength estimation and adjustment, enhancing human-AI interactions in games. Our code is available at https://rlg.iis.sinica.edu.tw/papers/strength-estimator.

Strength Estimation and Strength Adjustment

We propose a comprehensive strength system that consists of two main components: a strength estimator and a human-like strength adjustment system.

Strength estimator: We use a neural network to estimate a strength score (β) of an action at a given state based on human game records, with higher scores indicating stronger actions. For example, in Go, dan denotes advanced amateur, ranging from 1 dan to 9 dan, with higher numbers indicating stronger dan players; therefore, the strength scores are expected to follow: β_9dan > β_8dan > ... > β_1dan. To effectively measure the overall capabilities, we define the composite strength by aggregating all individual strengths. Then, based on Bradley-Terry model, we sequentially minimize the loss to ensure that the strength scores are strictly in desired order. Furthermore, we introduce an additional rank, \(r_\infty\), which is defined as the weakest among all ranks to handle unseen state-action pairs.

Human-like strength adjustment: To adjust to a specific rank, we first obtain the strength score which is evaluated by the strength estimator. Then, we modify the PUCT formula in MCTS based on a target strength score such that the search aligns more closely with the desired strength of ranks.

Experiment Results

Predicting ranks from games

We compare strength estimator (SE) and two traditional classification methods (SL_sum and SL_vote) on predicting player ranks in Go (left figure) and chess (right figure). SE achieves over 80% accuracy within just 15 and 26 games in Go and chess, respectively. In contrast, previous methods require 100 games to reach only 49% in Go and 32% in chess.

Adjusting Strength with Strength Estimator

We evaluate the performance of strength adjustment and human player's behavior of several MCTS approach.

MCTS achieves a high accuracy with human player's moves, but it cannot adjust strength.

SA-MCTS can adjust strengths, but it achieves the lowest accuracy with human player's behavior among all programs.

SE_∞-MCTS can adjust strengths and provide playing styles that are closely aligned with those of human players at specific ranks.

Win rate in Go.

Technique Comparison Heatmap with Errors

Win rate in Chess.

Model Download

Go
chess