Chess Fortresses: Supplementary Material on Neural Network Performance

1. Potential Weaknesses of Training Leela Chess Zero/Alpha Zero

Benchmarks are of central importance to machine learning. Particularly for architectures that are less well understood from a theoretical point of view, raw performance on datasets is the major driver for new developments and the major feedback about the state of the field.

In chess the strategy of a specific opponent, such as an open source engine, can be viewed as equivalent to a dataset. Any chess position can be fed into the engine and it will output a preferred move or a ranked set of moves from its policy. By feeding the engine chess positions, it is possible to create as large of a dataset as there are states in the game.

Google DeepMind’s Alpha Chess Zero (AZ) [1], [2], learned from self play [3] and competed against a leading mini-max engine at that time, Stockfish 7 and later Stockfish 8 [4].

The self-play approach means that there is no limit on how much data can be generated. While the machine learning community has traditionally relied on dataset size to minimize confounds in the form of artificial correlations or shortcut features, to attest unusual phenomena and encourage generalization, recent publications emphasize the importance of carefully creating data that is intentionally focussed on certain phenomena [5]. The motivation of that approach is to encourage generalization and reduce biases. Another motivation is that because of rare events and biases, scale is not enough. [6] showed that logarithmic performance increases are to expected as a function of dataset size alone. Exploring hard classes has proven to be a fruitful venue for fundamental architectural improvements in the past as shown by Russakovsy et.al.’s influential paper on ImageNet’s hard classes [7].

In the case of AZ, where the dataset is generated with selfplay, further challenges arise due to the homogenous type of agents that create the dataset. This challenge is described in a recent DeepMind paper [8], where they describe the importance of exploring the agent state space and having diverse agents with a heterogeneous skill set rather than focusing on finding the best performing agent in a head to head comparison as is done in AZ.

In order to explore and master the whole game state of chess, including all the chess position classes, it would be beneficial to have agents of different characteristics. If none of the relatively homogeneous agents in the pool of self-play agents masters a chess position class, then this position class will be a blind spot.

AZ is meant to be a general purpose architecture applicable to a wide range of tasks including mapping out the protein structures associated with the COVID-19 virus with AlphaFold [9]. The AZ architecture has had wide ranging impact. AlphaZero’s predecessors AlphaGo [10] success is, according to Henry Kautz, the serving Division Director for Information & Intelligent at NSF, one of the most important milestone’s of modern AI11, leading to China’s Sputnik Moment [11] which set the goal of China becoming a world leader in AI by 2030 [12]. Demis Hassabis the CEO of DeepMind wrote that AlphaZero like learning systems would contribute to finding “new breakthroughs in critical areas of science and medicine” [13]. David Silver, the main author of AlphaZero spoke in a similar way recently [14] emphasizing the impact of AZ beyond chess and games. Leela Chess Zero [15] is built on the same principles as AlphaZero and to a high degree similar both in terms of architecture and algorithms.

2. Testing Hard Classes

Until recently, most work on AI architectures focused on models that were tested with benchmarks. It was believed that access to more data was the key to success, see Yann LeCun in “Does AI Need More Innate Machinery?” [16]. Currently there is a growing consensus that more data is not enough and that the focus should shift towards fundamental improvements, see the Turing Award winner’s Yoshua Bengio’s NIPS 2019 talk “From System 1 Deep Learning to System 2 Deep Learning” [17] and his recent debate with Gary Marucs at Montreal AI [18]. See also Henry Kautz’s recent Robert S. Engelmore Memorial lecture [11] where he argues that fundamental architectures improvements are needed. Kautz considers the start of what he calls the ongoing Third AI summer to be in the year 2012, when the deep learning network AlexNet [19] won the ImageNet benchmark competition. Two years later, the main creators of ImageNet, Olga Russakovsky and Fei Fei Li wrote a very influential paper which explored ImageNet’s hard classes [7]. Following this trend, Bernhard Scholkopf together with the Turing Award winner Yoshua ¨ Bengio recently wrote a paper [20] arguing for more focus on causal inference related approaches and tasks. Another Turing Award winner Judea Pearl has also recenty followed a similar line of reasoning [21].

Our aim with the work presented here is to follow in their footsteps, carrying out extensive analysis on the performance of Leela Chess Zero, an open source reverse engineering chess engine which is built on similar principles as AZ, architecture networks focusing on their hard classes, three years after they entered the scene.

We considered the concept of making progress an interesting starting point for hard classes. AlexNet’s [19] convincing victory of the ImageNet 2012 benchmark competition marks the starting point of the current success of deep convolutional neural networks. ImageNet has static tasks related to classifying images. In chess the affordance of a chess position or how it can be developed taking the next step plays no less of a role than it’s static attractiveness. How would a method that has it’s roots in a static domain fare in chess position classes where the concept of progress plays a key role? In the experiments presented here, we focused on chess. Chess has been called the “drosophila of AI” [22]. Like drosophila, it can be viewed as a relatively simple system that nevertheless can be used to explore a larger, more complex phenomena. As Claude Shannon described, it is simple enough to enable mathematical formulation, and yet complicated enough to be theoretically interesting [23], [24]. It is the most widely-studied domain in the history of artificial intelligence [1]. To limit the scope of our experiments, we started with chess fortresses.

3. Fortresses

A. The human approach

It is surprising that for a human, the fortress chess task is relatively easy. We tested 14 positions from the fortress dataset positions on students of the Icelandic Chess School and on five chess Grandmasters at Webster University and Saint Louis University in the US. The Grandmasters managed to solve all the tested fortress positions and the children at the Icelandic Chess School solved all the positions after at times being given a hint.

The Grandmasters were asked to explain their thinking process when solving the fortress tasks. Analysis revealed that a human approaches fortresses in a logical manner, quickly zooming in on what in chess literature are called “critical squares”, which are possible entry points into the fortress. An optimal defensive formation for each of the critical squares is quickly established, as is a list of optimal attacking formations. First after constraining the general task in this manner, a human will start the search process among others focusing on whether there is a way for the attacking side to create a Zwang [25],[ 26] situation, where the defending side has to move – in chess it is not possible to say pass – breaking the optimal defensive formation while keeping the optimal attacking formation. For simple fortress tasks, such as a completely blocked chess position with pawns on the same color and the attacking side only having bishops of the same color as its pawns – the position is a draw even with worst possible defensive play -, a human novice can quickly understand the fortress concept and correctly evaluate the position, while the most sophisticated AI, including the AZ architecture methods and Stockfish fail to evaluate the chess position correctly. In some cases, it is possible to add extra material to the attacking side of a fortress position, material which does not change the nature of the fortress such as extra rooks in a position which is totally blocked by a pawn chain. Another example is that an arbitrary number of white squared black bishops could be added to the position in Figure 7, without changing the nature of the position, which is a fortress. A human, also school children, will quickly grasp that these extra pieces do not make a difference. For state of the art chess engines which use probabilistic reasoning with observed data distributions, extra pieces inevitably are interpreted as greater superiority and higher chances of winning. Fortresses thus are an example of a chess position type, where even school children may outperform current state of the art AI architectures.

A human displays outstanding aptitude via few-shot-learning, out of distribution, and compositionality performance on the fortress task, which are all aspects that the AZ architecture’s convolutional neural networks struggle with [4].

B. Prior partial progress attempts

In chess, two different attempts have been made at making partial progress on fortresses. In line with the human approach, reducing the search space, Eiko Bleicher developed the Freezer program [27], [28]. A key element is restricting the squares where the defensive pieces can be located on and move between. The impact of different piece and pawn exchanges also needs to be defined. Freezer works for chess positions that have few pieces on the board, typically the position is just one step, e.g. one piece, beyond the tablebases [29]. The idea is that the restrictions, together with the proximity to the tablebases, limits the search space, making an intractable task, due to the exponential nature of brute force searches in chess, tractable. Freezer can run on a normal PC. The weakness of the Freezer approach is that all the restrictions need to be provided by a human operator. The hard task for the AI, to carry out this logical processing task that a human finds simple, is thus offloaded to a human.

Fortresses were also a challenge to AZ’s opponent of choice, the mini-max engine Stockfish. A mini-max engine focuses primarily on search, there is a minimal position evaluation component in the form of relatively simple formulas which give points based on which material is on the board (pawn, bishop etc.), as well as the location of the pieces, their mobility and so on [30]. Fortresses turned out to be a too complicated phenomena to capture with a simple arithmetic formula. An attempt to let a mini-max [31] engine such as Stockfish determine whether a chess position is a fortress is described in [32]. Their non-generative approach is expensive, both computationally and in terms of storage space. AZ’s applies MCTS or depth first search, while a mini-max search is fundamentally breath first. In order to reduce the search space due to the high 35-38 average branching factor, various heuristic methods are applied, including alpha-beta pruning [33] and null move heuristics [34]. The goal for the heuristics is to select the key moves that really matter, and not spend much time on searching for alternatives, especially those that are close to the root node. The Guid and Bratko approach disables this, all legal moves in the root node are searched to maximum depth. The evaluation of all the moves is saved for each ply or search depth. They track whether some move, which was initially evaluated as inferior when searching at low search depth, jumps up the list of best moves due to having a more positive evaluation once the search moves to higher plys [35] or search depths. The idea is that such a move might transform the position in some way, e.g. if the position is close to being a fortress, but not quite, it might be the move that breaks the fortress. It is not unusual that breaking the fortress can imply sacrificing some material, e.g. a pawn to open up entry lines for the pieces. These moves would get an unfavorable evaluation when searched at a low ply, but deeper searches could detect the potential opened up by increased piece activity and yet deeper search discover that the sacrifices will open up entry points for the attacking pieces, ultimately leading to the sacrificed material being regained with interest. Thus, special attention should be paid to moves that make gains in terms of positive evaluation with an increasingly deep search. Their idea can be described as not very practical since it primarily works for the type of fortresses where some sacrifice, which at a low depth seems infeasible, is necessary. The root position in their experiments is this position where the sacrifice is possible, and it is left as an adjustable parameter how many plys the search should be. There is no way to know when the method should be applied and to apply it constantly would be prohibitively computationally expensive. Recently, there have been attempts at implementing Stockfish variants , most notably Crystal [36] and BlackDiamond [37], see [38]. Crystal works in such a way that for the side that has an advantage, it penalizes moves that lead to cyclic variations. That is, the same chess position occurring repeatedly, since such moves do not lead to progress being made.

C. Neural networks

We are not aware of prior attempts to test neural networks on chess fortresses. A somewhat related, although much simpler task, is a game such as Montezuma’s Revenge, where an agent needs to perform a complicated compositional action such as find a key and turn it in a keyhole which opens up a new pathway. The key, keyhole and new path are typically at different geometrical locations. This task is very challenging for a pure deep reinforcement learning approach which would need to explore the environment for a prohibitively long time before accidently stumbling upon the right solution. Hoang M. Le et.al. showed in [39] that a hierarchical deep reinforcement learning approach is suitable for this task. The hierarchical approach can have similar challenges as the Freezer program described above, since the higher level abstraction actions that lead to the ultimate goal need to be defined – the method does not come up with these by itself. As an example, the high level actions that lead to the ultimate goal such as “pick up the key” and “open the door with the key” need to be defined a priori. Fortresses are more multifaceted, fluid, variable and require taking the whole context into consideration – the concept of a critical square on a chess board can have many forms - than a very concrete, isolated task such as finding a key and opening a specific door.

4. Experimental Setup

The following list shows all configuration parameters of Lc0 that were used for the experiments described in the paper:

" test_config": {
"enginePath": "./Leela/lc0.exe",
"weights_path": "./run_networks/",
"Threads": 2,
"BackendOptions": "gpu=0", "minibatchsize": 256,
"NNCacheSize": 200000,
"MaxPrefetch": 32,
"LogitQ": false,
"CPuct": 2.15,
"CPuctAtRoot": 2.15,
"CPuctBase": 18368.0,
"CPuctBaseAtRoot": 18368.0,
"CPuctFactor": 2.82,
"CPuctFactorAtRoot": 2.82,
"Temperature": 0.0,
"TempDecayMoves": 0,
"TempCutoffMove": 0,
"TempEndgame": 0,
"TempValueCutoff": 100.0,
"TempVisitOffset": 0.0,
"DirichletNoise": false,
"VerboseMoveStats": true,
"FpuStrategy": "reduction",
"FpuValue": 0.44,
"FpuStrategyAtRoot": "same",
"FpuValueAtRoot": 1.0,
"CacheHistoryLength": 0,
"PolicyTemperature": 1.61,
"MaxCollisionEvents": 32,
"MaxCollisionVisits": 9999,
"OutOfOrderEval": true,
"MaxOutOfOrderEvalsFactor": 1.0,
"StickyEndgames": true,
"SyzygyFastPlay": true,
"PerPVCounters": false,
"ScoreType": "Q""HistoryFill": "always",
"ShortSightedness": 0.0,
"MaxConcurrentSearchers": 1,
"DrawScoreSideToMove": 0,
"DrawScoreOpponent": 0,
"DrawScoreWhite": 0,
"DrawScoreBlack": 0,
"UCI_ShowWDL": true,
"KLDGainAverageInterval": 100,
"MinimumKLDGainPerNode": 0.0,
"SmartPruningFactor": 1.33,
"SmartPruningMinimumBatches": 0,
"RamLimitMb": 0,
"MoveOverheadMs": 200,
"Slowmover": 1.0,
"ImmediateTimeUse": 1.0,
"nodes": 65536,
"LogFile": "lc0_log.txt"
}

This is a list of all the tested networks:

128x10-ccrl-moves_left-scalar-plies\ -huber-10000
128x10-t59-moves_left-scalar-plies\ -huber10-10000
MLH_Tinker_128x10-T60-MovesLeft-Policy\ Conv-ValueWDL-Steps800K-TF2-swa-800000 MLH_Chad_128x10-c21-1088000
62535_moves_left
128x10-moves_left-t40-0.1-200000 LS_14_20x256SE-jj-9-53420000
LS14.1-20x256SE-jj-9-59420000 700401 62550
62678 256x20-t40-1541 384x30-t40-2036 384x30-t60-3010
384x30-t60-3070
11258-16x2-se-4
11258-112x9-se
42872
591226
badgyal-7_128x10
badgyal-8_128x10
DarkQueen_V2.3
ender128-90l
evilgyal-6_48x5
FatFritz
goodgyal-5_48x5
goodgyal-7_192x16
J13B_2_136
J13B_2_220
J13B_3_200
J13B_4_150
J20-460J48-160
J64-180
LD2
LS_13_1_20x256SE-jj-9-44500000
T10-11262-20x256
T20-22201-20x256
T30-33005-20x256
T35-36091-10x128
T35-37080-10x128
T40-42847-20x256
T40B_1-106
T40B_2-106
T40B_4-260
T50-50782-10x128
T51-51458-10x128
T52-52376-10x128
T53-53316-10x128
T54-54255-10x128
T55-55109-10x128
T56-56296-10x128
T57-57402-10x128
T58-58608_10x128
T59_591226-10x128
weights_598
weights_run1_62816_320x24_active\ _ELO_2934
weights_run1_62828_320x24_active\ _ELO_2924
weights_run1_62833
weights_run1_62874
weights_run2_700772
15x192-jj-1-520000
15x192-jj-1-swa-400000
20x256SE-jj-9-swa-41500000
weights_run1_63098
weights_run2_701318

The following list shows all 18 fortress positions in the dataset in .pgn format. The positions are presented as positions in a chess game. White is the defending side and also has the move in all the games except the 6th one (Ree vs. Hort).

[Event "Turkmenskaya Iskra"]
[Site "?"] [Date "1926.??.??"]
[Round "?"]
[White "Froim Markovich Simkhovich"]
[Black "White to Play and draw"]
[Result "*"]
[SetUp "1"]
[FEN "4K3/4Bp1N/2k3p1/5PP1/8/7p/b7/8 w - - 0 5"]
[PlyCount "1"]
[EventDate "1926.??.??"]

5. f6 *

[Event "Turkmenskaya Iskra"]
[Site "?"]
[Date "1926.??.??"]
[Round "?"]
[White "Froim Markovich Simkhovich"]
[Black "White to Play and draw"]
[Result "*"]
[SetUp "1"]
[FEN "4K3/4Bp1N/2k2Pp1/6P1/8/8/b6p/8 w - - 0 6"]
[PlyCount "1"]
[EventDate "1926.??.??"]

6. Bf8 *

[Event "Turkmenskaya Iskra"]
[Site "?"]
[Date "1926.??.??"]
[Round "?"]
[White "Froim Markovich Simkhovich"]
[Black "White to Play and draw"]
[Result "*"]
[SetUp "1"]
[FEN "4KB2/5p1N/2k2Pp1/6P1/8/8/b7/7q w - - 0 7"]
[PlyCount "1"]
[EventDate "1926.??.??"]

7. Bh6 *

[Event "Turkmenskaya Iskra"]
[Site "?"]
[Date "1926.??.??"]
[Round "?"]
[White "Froim Markovich Simkhovich"]
[Black "White to Play and draw"]
[Result "*"]
[SetUp "1"]
[FEN "4K3/2k2p1N/5PpB/6P1/8/8/b7/7q w - - 0 8"]
[PlyCount "1"]
[EventDate "1926.??.??"]

8. Kf8 *

[Event "Turkmenskaya Iskra"]
[Site "?"]
[Date "1926.??.??"]
[Round "?"]
[White "Froim Markovich Simkhovich"]
[Black "White to Play and draw"]
[Result "*"]
[SetUp "1"]
[FEN "q4K2/2k2p1N/5PpB/6P1/8/8/b7/8 w - - 0 9"]
[PlyCount "1"]
[EventDate "1926.??.??"]

9. Kg7 *

[Event "The Chess Amateur"]
[Site "?"]
[Date "1912.??.??"]
[Round "?"]
[White "W. Rudolph"]
[Black "White to Play and draw"]
[Result "*"]
[SetUp "1"]
[FEN "3B4/1r2p3/r2p1p2/bkp1P1p1/1p1P1PPp/p1P1K2P/PPB5/8 w - - 0 1"]
[PlyCount "1"]
[EventDate "1912.??.??"]

1. Ba4+ *

[Event "The Chess Amateur"]
[Site "?"]
[Date "1912.??.??"]
[Round "?"]
[White "W. Rudolph"]
[Black "White to Play and draw"]
[Result "*"]
[SetUp "1"]
[FEN "3B4/1r2p3/r2p1p2/b1p1P1p1/kp1P1PPp/p1P1K2P/PP6/8 w - - 0 2"]
[PlyCount "1"]
[EventDate "1912.??.??"]

2. b3+ *

[Event "The Chess Amateur"]
[Site "?"]
[Date "1912.??.??"]
[Round "?"]
[White "W. Rudolph"]
[Black "White to Play and draw"]
[Result "*"]
[SetUp "1"]
[FEN "3B4/1r2p3/r2p1p2/bkp1P1p1/1p1P1PPp/pPP1K2P/P7/8 w - - 0 3"]
[PlyCount "1"]
[EventDate "1912.??.??"]

3. c4+ *

[Event "The Chess Amateur"]
[Site "?"]
[Date "1912.??.??"]
[Round "?"]
[White "W. Rudolph"]
[Black "White to Play and draw"]
[Result "*"]
[SetUp "1"]
[FEN "3B4/1r2p3/r1kp1p2/b1p1P1p1/1pPP1PPp/pP2K2P/P7/8 w - - 0 4"]
[PlyCount "1"]
[EventDate "1912.??.??"]

4. d5+ *

[Event "The Chess Amateur"]
[Site "?"]
[Date "1912.??.??"]
[Round "?"]
[White "W. Rudolph"]
[Black "White to Play and draw"]
[Result "*"]
[SetUp "1"]
[FEN "3B4/1r1kp3/r2p1p2/b1pPP1p1/1pP2PPp/pP2K2P/P7/8 w - - 0 5"]
[PlyCount "1"]
[EventDate "1912.??.??"]

5. e6+ *

[Event "The Chess Amateur"]
[Site "?"]
[Date "1912.??.??"]
[Round "?"]
[White "W. Rudolph"]
[Black "White to Play and draw"]
[Result "*"]
[SetUp "1"]
[FEN "3k4/1r2p3/r2pPp2/b1pP2p1/1pP2PPp/pP2K2P/P7/8 w - - 0 6"]
[PlyCount "1"]
[EventDate "1912.??.??"]

6. f5 *

[Event "Deutschland ch, Gladenbach"]
[Site "?"]
[Date "1997.??.??"]
[Round "?"]
[White "Maiwald"]
[Black "Bischoff"]
[Result "*"]
[SetUp "1"]
[FEN "8/6B1/5N2/3p1k2/1bb3p1/6Pp/5K1P/8 w - - 0 1"]
[PlyCount "1"]
[EventDate "1997.??.??"]

1. Nxd5 *

[Event "Deutschland ch, Gladenbach"]
[Site "?"]
[Date "1997.??.??"]
[Round "?"]
[White "Maiwald"]
[Black "Bischoff"]
[Result "*"]
[SetUp "1"]
[FEN "8/6B1/8/3b1k2/1b4p1/6Pp/5K1P/8 w - - 0 2"]
[PlyCount "1"]
[EventDate "1997.??.??"]

2. Bd4 *

[Event "?"]
[Site "?"]
[Date "1947.??.??"]
[Round "?"]
[White "Chekhover"]
[Black "?"]
[Result "*"]
[SetUp "1"]
[FEN "7r/p3k3/2p5/1pPp4/3P4/PP4P1/3P1PB1/2K5 w - - 0 1"]
[PlyCount "1"]
[EventDate "1947.??.??"]

1. Kd1 *

[Event "Hoogovens"]
[Site "?"]
[Date "1986.01.31"]
[Round "?"]
[White "Hans Ree"]
[Black "Vlastimil Hort"]
[Result "*"]
[SetUp "1"]
[FEN "4knQ1/7r/3p2p1/2bP1pP1/5P1N/6K1/8/8 b - - 0 59"]
[PlyCount "1"]
[EventDate "1986.??.??"]

59... Rxh4 *

[Event "Hoogovens"]
[Site "?"]
[Date "1986.01.31"]
[Round "?"]
[White "Hans Ree"]
[Black "Vlastimil Hort"]
[Result "*"]
[SetUp "1"]
[FEN "4knQ1/8/3p2p1/2bP1pP1/5P1K/8/8/8 b - - 0 60"]
[PlyCount "1"]
[EventDate "1986.??.??"]

60... Bd4 *

[Event "Schweizarische Schachzeitung"]
[Site "?"]
[Date "1923.??.??"]
[Round "?"]
[White "Froim Simkhovich"]
[Black "?"]
[Result "*"]
[SetUp "1"]
[FEN "7k/4p3/5p2/1p1p1p2/1PpP1PpB/1pP3P1/7P/6NK w - - 0 1"]
[PlyCount "1"]
[EventDate "1923.??.??"]

1. Bxf6+ *

[Event "Schweizarische Schachzeitung"]
[Site "?"]
[Date "1923.??.??"]
[Round "?"]
[White "Froim Simkhovich"]
[Black "?"]
[Result "*"]
[SetUp "1"]
[FEN "7k/8/5p2/1p1p1p2/1PpP1Pp1/1pP3P1/7P/6NK w - - 0 2"]
[PlyCount "1"]
[EventDate "1923.??.??"]

2. h4 *

5. Additional Experimental Results

The graphs in Figures 1, 2, 3, and 4 show detailed analysis of the five best networks selected based on the average of the match between their most preferred move choice and the ground truth move choice. The list of the networks is displayed in ranking order with the best performing network according to the data displayed in the figure on the top of the list.

Entering a fortress. The graph shows their percentage agreement for the accumulated two best moves moves of the networks and the ground truth entering a fortress move. Note that for each entering a fortress position, there is only one correct move. Thus if for a specific position the most preferred move was correct, then second most preferred move can not also be correct. We observe that the moves left network 128x10-ccrl-moves-le performs best. It thus does not only have the best performance when considering its most preferred move (see figures 2 and 3 in main paper) but also shows high quality move choices for its second best move.

Fig. 1: Entering a fortress. The graph shows their percentage agreement for the accumulated two best moves moves of the networks and the ground truth entering a fortress move. Note that for each entering a fortress position, there is only one correct move. Thus if for a specific position the most preferred move was correct, then second most preferred move can not also be correct. We observe that the moves left network 128x10-ccrl-moves-le performs best. It thus does not only have the best performance when considering its most preferred move (see figures 2 and 3 in main paper) but also shows high quality move choices for its second best move.

For the fortress scenario, the correct centipawn evaluation is zero, the game will result in a draw given optimal play by both sides. We see that there is a substantial deviation from this correct value for these networks which were the top five in terms on guessing the correct move with their most preferred move choice (Figure 3) and second most preferred move choice (Figure 4). A centipawn value of -1 means a one pawn or equivalent deficit. It is interesting that for almost no search, some of the networks, especially when guessing a wrong move, gave an evaluation close to zero. This may be an artifact since the move choice was incorrect, but needs further exploration. It is interesting that for the correct moves, the best fit line of the most preferred moves of the networks has a negative slope in all cases except for the best network, the moves left one, indicating that searching deeper seems for the vanilla AZ architecture networks to bring the evaluation further away from the correct one.

Entering a fortress. The graph shows their percentage agreement for the accumulated four best moves moves of the networks and the ground truth entering a fortress move. We observe that the moves left network 128x10-ccrl-moves-le performs best. It thus does not only have the best performance when considering its most preferred move (see figures 2 and 3 in main paper), and when considering the acculumated two best moves, see (Figure 1) above, but also outperforms the other other networks when considering the accumulated performance of its four most preferred moves. Similar results were observed for five most preferred moves. The 128x10-ccrl-moves-le network thus shows a stable superior performance in terms of finding the correct move for entering a fortress as a defending side

Fig. 2: Entering a fortress. The graph shows their percentage agreement for the accumulated four best moves moves of the networks and the ground truth entering a fortress move. We observe that the moves left network 128x10-ccrl-moves-le performs best. It thus does not only have the best performance when considering its most preferred move (see figures 2 and 3 in main paper), and when considering the acculumated two best moves, see (Figure 1) above, but also outperforms the other other networks when considering the accumulated performance of its four most preferred moves. Similar results were observed for five most preferred moves. The 128x10-ccrl-moves-le network thus shows a stable superior performance in terms of finding the correct move for entering a fortress as a defending side

Figure 5 shows the same results as in Figure 4 presented as winning percentage. The correct winning percentage, assuming best play for both sides, for the fortress scenario is 50% or a draw. We see that all the on average top 5 networks in terms in guessing the enter a fortress moves right severely lack the understanding of the fortress concept in the sense that they are very pessimistic about their chances in the resulting position. We observe that more search depth in general with the exception of the best network, the moves left network 128x10- ccrl-moves-le leads to a less correct evaluation (negative slope of the best fit line).

The moves left network, the 128x10-ccrl-moves-le outperforms the other architectures consistently without exception in every experiment. It has a better first move choice and is also, in general, having better move choices, as seen by the fact that it maintains it’s lead when more than one move guess is allowed, for the accumulated percentage for the, 2, 3, 4 and 5 move choices (the results for the 3rd 5 th are not shown).

Centipawn value of agreed and disagreed positions for the networks most preferred move. Red indicates that the move was the correct one, blue the value for an incorrect move guess. We observe that all the networks struggle with evaluating a fortress position correctly. A correct evaluation would be zero or close to zero (due to MCTS the side which has less possibilities for making a mistake will be favored slightly although optimal play for both sides results in a draw or zero centipawn evaluation). The 128x10-ccrl-moves-le network for its correct move is the only network that has a positive slope of a best fit line, for its correct move choices, indicating that it is the only network that benefits from more nodes searched.

Fig. 3: Centipawn value of agreed and disagreed positions for the networks most preferred move. Red indicates that the move was the correct one, blue the value for an incorrect move guess. We observe that all the networks struggle with evaluating a fortress position correctly. A correct evaluation would be zero or close to zero (due to MCTS the side which has less possibilities for making a mistake will be favored slightly although optimal play for both sides results in a draw or zero centipawn evaluation). The 128x10-ccrl-moves-le network for its correct move is the only network that has a positive slope of a best fit line, for its correct move choices, indicating that it is the only network that benefits from more nodes searched.

6. A Qualitative Follow Up Experiment - the Hard Classes for the Moves Left Head Network

The moves left head network, that turned out to outperform the others on the fortresses was developed with the goal in mind to make lc0’s endgame play more effective [40], finishing off won games instead of “trolling” [41], [42]. We included it in our list of networks both because we wanted to explore endgame performance (results not presented here) and also because we had a bias towards including experimental architectural modifications. It should be noted that our test positions are not tablebase [29] endgames. Further qualitative tests of the 128x10-ccrl-moves-le network indicate that it seems that while learning to finish superior games off quickly, it also seems to have learned to prolong defeat once faced with what it perceives as an inferior position. If given a choice between moves that lead to a quick defeat and a move which will prolong the game, it will chose the latter.

Centipawn value of agreed and disagreed positions for the networks second most preferred move. Red indicates that the move was the correct one, blue the value for an incorrect move guess. We observe that the evaluation of correct move choices is closer to the correct evaluation than the evaluation of incorrect move choices.

Fig. 4: Centipawn value of agreed and disagreed positions for the networks second most preferred move. Red indicates that the move was the correct one, blue the value for an incorrect move guess. We observe that the evaluation of correct move choices is closer to the correct evaluation than the evaluation of incorrect move choices.

Figure 6 is one of the positions in the entering a fortress dataset. It is from a study which was composed in the year 1926 by Froim Markovich Simkhovich [43]. It was selected because this position seems especially challenging for the moves left head network. The reason might be that white has a choice with 1.Bh6 constructing the fortress or 1.Kf8, leading to a position which takes time and precision for black to break down, with the key move involving a queen sacrifice. Here the correct move is 1.Bf8. After black promotes his pawn to a queen with h1=Q, white plays 2.Bh6. The next white move is going to be 3.Kf8, and after 4.Kg7 white has established an impregnable fortress. The moves left network, wants to play 1.Kf8, which also makes sense since after h1=Q, white plays 2.Kg7. This version of the fortress is, however, not impregnable because the key square, or weak point, of the position, the white g5 pawn, is insufficiently protected. It lacks the protection of the Bh6, and thus, it is only protected by the white knight on Nh7. A human can, with logical thinking, quickly establish that g5 is the critical square: it forms the base of white’s pawn chain, and is thus not protected by another white pawn, in contrast to the white f6 pawn.The white f6 pawn is already protected twice, by the white g5 pawn and also by the white knight on Nh7. It does not need further protection. After 1.Kf8, leaving the white bishop on Be7, black will eventually move the king next to the white g5 pawn, to e.g. . . . Kf5, and then sacrifice his queen for the white g5 and the white knight on Nh7 with . . . Qxg5, reaching a won pawn-up endgame. It will take many moves, but black will eventually get there. In order to establish that the version with Bh6 is a fortress it is also necessary to verify that there is not a Zwang [26] scenario. Two key instances need to be considered there: with the black king, e.g. on . . . Kf5, and the black queen threatening to capture the white g5 pawn. Here, the white knight must remain on Nh7, while the white bishop on Bh6 protects the sensitive g5 pawn. White can move the king. The other instance is if the black queen is on the 8th rank, preventing the white king on Kg7 to move. Here, white can move the knight back and forth between Nh7-Nf8, since the black queen can not threaten the white g5 pawn from this position, and thus, the pawn is sufficiently protected by the white bishop on Bh6. Notice how a human approach restricts the search space, focusing on only a couple of key scenarios. This position was tested on human players (not part of the results presented here). Although challenging, they managed to solve it. The moves left network has a fondness for the move 1.Kf8 at the search depth explored. It takes many moves an also the imagination of getting the idea to give a queen up for a pawn and a knight to find the right solution. It seems hard for the moves left network to detect the difference between the not a fortress version with the white bishop on Be7 and the fortress version with the white bishop on Bh6 protecting the key square g5. In both cases, white is seemingly in a position where no loss is on the horizon.

Winning percentage of agreed and disagreed positions for the networks second most preferred move. Red indicates that the move was the correct one, blue the value for an incorrect move guess. A correct winning percentage for a fortress is slightly lower than 50% (it is 50% given optimal play of both sides). We observe that the evaluation for the correct moves is closer to the correct one than the evaluation of the incorrect moves.

Fig. 5: Winning percentage of agreed and disagreed positions for the networks second most preferred move. Red indicates that the move was the correct one, blue the value for an incorrect move guess. A correct winning percentage for a fortress is slightly lower than 50% (it is 50% given optimal play of both sides). We observe that the evaluation for the correct moves is closer to the correct one than the evaluation of the incorrect moves.

In (Figure 7), we see that the moves left head network is very pessimistic about white’s defending chances giving white only 5.4% winning score. In its most preferred variation, it fails to construct the fortress by playing 1.Kf8 instead of the correct move 1.Bf8, with the idea to protect the weak g5 pawn with Bh6 in the next move. It is interesting that in this mainline, black allows white to later construct the fortress. After 1.Kf8 h1=Q 2.Kg7 Kd5 3.Kg8 Ke5 4.Bf8, it wants black to play 4. . . Kf5, which is a mistake since it allows white to construct the fortress with 5.Bh6. Instead black should prevent the key maneuver Bf8-h6 by pinning the white bishop with 4...Qa8, preventing white from constructing the fortress. White will then have to play 5.Kg7, leading to the position examined in (Figure 8). Continuing with the analysis presented in (Figure 7), after 4. . . Kf5 5.Bh6 the fortress has been constructed. White, however, does not maintain the fortress since after 5...Qa1 6.Kh8 Bd5 7.Bf8 Qe5 it wants white to play the mistake 8.Kg7 instead of retreating the white bishop with Bh6, protecting the key g5 pawn. After 8.Kg7 Qh2 9.Bc5 white has first failed to construct the fortress, black has then allowed white to construct it, white has voluntarily entered out of the fortress construct. Thus, in the final position of this most preferred line, the position is not a fortress. The correct move 1.Bf8 is it’s third preferred choice with a 2.9% winning score. Here, white enters the fortress and maintains it. The nodes searched, however, is much smaller than the nodes searched in the most preferred variation after 1.Kf8. The correct move 1.Bf8 has only 487k visits compared to the 68M visits of 1.Kf8. This seems like a classical example of search drift, with 1.Kf8 gaining more initial attention and subsequently being searched much deeper than the correct move 1.Bf8. The correct move is being underestimated due to too little attention in terms of nodes searched by PUCT.

Fig. 6: Qualitative test of the 128x10-ccrl-moves-le network – description of the human approach

In the network’s mainline, we enter the correct moves for black, not allowing white to subsequently construct the fortress, by pinning the white bishop with the move 4...Qa8, preventing the key Bf8-h6 maneuver. With the bishop pinned, white can only move the king back and forth Kg8-g7-g8, allowing black to maneuver with his king to the most favorable square . . . Kf5, leading to the position presented in (Figure 8). Now the correct move for black is to attack the Achilles heel of the white position, the white pawn on g5 with 1. . . Qg2, with the intention to sacrifice the queen with 2. . . Qxg5 in the next move, giving it up for the white knight on Nh7 and the white pawn on g5. White can not prevent this sacrifice, if 2.Kh6, then black can play 2. . . Qh3+ 3.Kg7 Qh5 and sacrifice the queen on g5 in the next move independent of what white plays. We observe that it is after looking at this position for about half an hour not preferring this key queen maneuver which has the intention to sacrifice the queen. It is evaluating the position as extremely favorable for black with over 90% winning probability almost no matter what black does.

We observe that after 15 minutes of deliberation, the queen sacrifice 1. . . Qxg5 is not one of it’s 16 most preferred moves (Figure 9). The only other move, which would prevent white from constructing a fortress by playing Bh6, 1...Qa8 is also ranked quite low or move nr. 16.

Fig. 7: The 128x10-ccrl-moves-le network’s analysis of an instance of the position type that it struggles with after 5 hours and 20 minutes

Leela Chess Zero is a very active project. We also tested version 0.25.0 and were curious to see if improvements introduced there would provide a remedy for the challenges that we see that the moves left network faces. We see that the challenges persist. After running the new version over night for 9 hours on the position in (Figure 10), which is the very key position, where black needs to sacrifice the queen in order to break the fortress and win. Otherwise, white will play Bh6 next move constructing an impregnable fortress. We observe that the correct move is considered by far the worst possible move that black can make in the position! We also observe that M, which is how many plies or half moves will be needed to finish the game, is much too optimistic in the case black does not sacrifice the queen or only around 27, meaning that the engine thinks that black is going to win in about 13 or 14 moves played by both sides. The correct value there is that black is not going to win at all if white plays Bh6 in the next move and white stays put, only moving the king after that. We see that for the correct move M is about 69 or around 35 move until the game is finished. Playing out the game to the finish after the queen sacrifice: 1.Kg7 Kh4 2.Bc5 g5 3.Bf2+ Kh5 4.Be1 g4 5.Bf2 Be6 6.Bg3 Kg5 7.Bc7 Kh4 8.Kh6 g3 9.Bb6 g2 10.Bf2+ Kh3 11.Bg1 Kg4 12.Bb6 Kf3 13.Kg5 Ke2 14.Kh4 Kf1 15.Kg3 g1=Q 16.Bxg1 Kxg1 and now we are at a Tablebasis position [29], where a lookup table tell us that with optimal play from both sides, black will checkmate in 21 moves. Thus, it will take 36 moves to finish the game which means that the M is just about right. We observe that rather than the M for the correct move being incorrectly evaluated, the problem is that for the incorrect moves, M is way off. In contrast to what was intended, giving the moves left head more weight in the PUCT formula here leads to even worse performance, since M for the correct move is much higher than the M for the incorrect moves resulting the correct move being in that case even less explored. We observe that the main problem remains the position evaluation or the Q value, with the Q value after the queen sacrifice being only Q=0.289 while without the sacrifice it is over 0.9. We see that after the queen sacrifice, a lot of the simulated game scenarios end in a draw. In some sense we are observing a vicious cycle: after the queen sacrifice it is necessary to play precisely in order to win the game. It is possible to go astray which means that the game will end in a draw. This means that in the simulations, the Q value which takes an average of the simulations results will not be that high. That means that the move will not be explored so much, which means that precision will be lacking, closing the cycle. In the simulations, after the queen sacrifice 705 scenarios or around 70% are ending in a draw. Black is winning the game in 292 simulations and even losing the game in 3.

The hard classes of the 128x10-ccrl-moves-le moves left network. The right moves for breaking down the apparent fortress have been given. Does it see the key maneuver breaking down the apparent fortress with a queen sacrifice – 1. . . Qg2 and next move 2. . . Qxg5 sacrificing the black queen for the white knight and the white g5 pawn?

Fig. 8: The hard classes of the 128x10-ccrl-moves-le moves left network. The right moves for breaking down the apparent fortress have been given. Does it see the key maneuver breaking down the apparent fortress with a queen sacrifice – 1. . . Qg2 and next move 2. . . Qxg5 sacrificing the black queen for the white knight and the white g5 pawn?

We observe that while all of the other moves get at least around 2M node searches, the correct move gets only 7.7k. We observed that running the simulation 7.5 hours longer (9 hours compared to 1.5 hours), resulted in only a small change in the winning % of the most favored moves, they went down from 95.8% after 1.5 hours analysis or 50M nodes searched to 95.5% after 9 hours analysis or 114M nodes searched #increase the cache size, the main reason being slightly lower Q value due to it winning slightly less games in the simulations. This is, however, a very marginal decrease which has a small effect on the nodes given to each of the possible moves.

Thus, the challenges presented here remain also with this update of lc0.

The moves left head 128x10-ccrl-moves-le network’s hard classes - does it find the key move that breaks the fortress, sacrificing the queen with 1...Qxg5?

Fig. 9: The moves left head 128x10-ccrl-moves-le network’s hard classes - does it find the key move that breaks the fortress, sacrificing the queen with 1...Qxg5?

We carry out the right move, sacrificing the queen, for it again, resulting in the position in (Figure 11)

We observe that immediately after the queen sacrifice has been played, the network still evaluates the position favorably for black, giving black over 60% winning chances, but it is a significant drop from the over 90% which it thought was the case prior to the fortress breaking queen sacrifice. We observe that the Q value for the most preferred move is now 0.277. Prior to the queen sacrifice, in (Figure 9), the Q value of the most preferred move was 0.882. The difference in Q values is 0.882-0.277 = 0.605.

Fig. 10: Same diagram as in (Figure 9), with the brand new 0.25.0 version of the lc0 MCTS engine which came out right before the deadline for this work. We observe that it considers the winning move to be the most inferior of all possible moves for black in the position

We observe that the U value difference between the most preferred and most searched move and the least preferred and least searched move is of a much smaller magnitude or in (Figure 9), U=0.001 for the most preferred move and U=0.047. For (Figure 11) where we have not searched deep yet, we observe that the U value of the most preferred move is U=0.001 and for the least preferred one U=0.247. Having in mind that PUCT will select nodes based on the maximum value of Q+U, we understand that the huge difference in Q values of 0.605 for the position prior to the queen sacrifice and the one after it, out weights the exploration factor U leading to the correct move breaking the fortress with a queen sacrifice not being explored sufficiently.

After we have helped the network by entering for it the correct solution, sacrificing the queen, we see the favorable evaluation of the moves left network steadily arise with more nodes searched.

Fig. 11: The 128x10-ccrl-moves-le network’s position evaluation after the key move sacrificing the queen to break the fortress has been played

In this position white will eventually have to give up his bishop for the passed black g-pawn, which will otherwise queen resulting in a winning game for black since the black f7 pawn is going to be protected by the black . . . Ba2. The black king will eventually win the white f6 pawn and the game. Black still has to be careful not to allow white to build a new fortress by allowing white’s king to retreat. The moves left network’s most preferred line is correct leading to a win. We, however, observe that in the 2nd most preferred move variation 1. . . Kf5 2.Bd6 g5 3.Bc7 g4, now white can build a fortress with 4.Kh6! (instead of 4.Bd6 which loses in the manner the network’s line continues) with the idea to sacrifice the white f6 pawn since after 4..Kxf6 5.Kh5 Kf5 6.Kh4 and next Kg3 (also after 4..Ke4 5.Kh5 Kf3 6.Kh4 white is just in time keeping the black g-pawn at bay with his king). White keeps the bishop on the b8-f4 diagonal and his king on Kg3. The black pawns can not advance. It is a new fortress and a draw. Note that this position would not qualify for our original fortress dataset, since after the white f6 pawn has been captured, there are 6 pieces on the board including the kings, which is a tablebases position, which the networks might have been trained on. The most preferred and also correct move 1..Kh4 is natural for a human since it shields the white king or prevents it from advancing down the board via Kg7-h6-h5-h4. If that route is not available, the white king would have to go Kg7-f8-e7-d6- e5-f4 or even Kg7-f8-e7-d6-c5-d4-e3which are longer routes. A human player could solve this task in a logical manner paying attention to this fact rather easily.

Fig. 12: The 128x10-ccrl-moves-le network’s position evaluation rises rapidly with more nodes searched

Playing a few more moves, we observe that the winning estimate of the moves left network increases steadily, here reaching 73.3% and growing. In it’s mainline it is finding the right solution, winning the white bishop for the black g-pawn. There is no doubt that it will manage to finish of the game from this position winning the full point as black.

This paper presents a new, original dataset which presents perhaps the most challenging chess task available: finding unique best moves which are necessary for entering a fortress as a defending side. There is no margin for error. The task is of logical nature, it is not untypical that adding more pieces of a certain type to the attacking side does not change the nature of the fortress. The task is of logical nature, school children can quickly grasp the key logical concepts and also that more pieces of a certain type will not change the effect of the key logical concepts on the impregnability of the fortress. In contrast state of the art AI architectures which fundamentally work with probabilistic reasoning of observed data distributions, find this task challenging. We test a large number of experimental versions of Leela Chess Zero, a leading chess architecture which is built on the same principles as AlphaZero. To our surprise a novel architecture with certain modifications to the convolutional neural network, consistently outperforms the other candidates, despite being one of the smallest architectures tested or 1/27th of the size of the largest ones. Our results are a step towards better understanding causal reasoning with a new benchmark dataset which is of causal reasoning nature. We also explore whether it is possible to modify neural architectures so that they can better perform on a task that is of causal nature.

Fig. 13: The confidence of the 128x10-ccrl-moves-le network that black is going to win increases steadily with more nodes searched

References

[1] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, “A general reinforcement learning algorithm that masters chess, shogi, and go through self-play,” Science, vol. 362, no. 6419, pp. 1140–1144, 2018. [Online]. Available: https://science.sciencemag.org/content/362/6419/1140
[2] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. P. Lillicrap, K. Simonyan, and D. Hassabis, “Mastering chess and shogi by self-play with a general reinforcement learning algorithm,” CoRR, vol. abs/1712.01815, 2017.
[3] G. Tesauro, TD-Gammon: A Self-Teaching Backgammon Program. Boston, MA: Springer US, 1995, pp. 267–285. [Online]. Available: https://doi.org/10.1007/978-1-4757-2379-3 11
[4] B. M. Lake, “Compositional generalization through meta sequence-tosequence learning,” in NeurIPS, 2019.
[5] A. Barbu, D. Mayo, J. Alverio, W. Luo, C. Wang, D. Gutfreund, J. Tenenbaum, and B. Katz, “Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models,” in Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlch ´ e-Buc, E. Fox, and R. Garnett, Eds. Curran ´ Associates, Inc., 2019, pp. 9453–9463.
[6] H.-J. Chang, G.-Y. Fan, J.-C. Chen, C.-W. Hsueh, and T.-s. Hsu, “Validating and fine-tuning of game evaluation functions using endgame databases,” in Computer Games, T. Cazenave, M. H. Winands, and A. Saffidine, Eds. Cham: Springer International Publishing, 2018, pp. 137–150.
[7] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision (IJCV), vol. 115, no. 3, pp. 211–252, 2015.
[8] D. Balduzzi, M. Garnelo, Y. Bachrach, W. Czarnecki, J. Perolat, ´ M. Jaderberg, and T. Graepel, “Open-ended learning in symmetric zerosum games,” in ICML, 2019.
[9] A. Senior, R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre, T. Green, C. Qin, A. Zˇ´ıdek, A. Nelson, A. Bridgland, H. Penedones, S. Petersen, K. Simonyan, S. Crossan, P. Kohli, D. Jones, D. Silver, K. Kavukcuoglu, and D. Hassabis, “Improved protein structure prediction using potentials from deep learning,” Nature, vol. 577, pp. 1–5, 01 2020.
[10] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. P. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, pp. 484–489, 2016.
[11] H. Kautz, “The third ai summer, aaai 2020 robert s. engelmore memorial award lecture,” Youtube, 2020. [Online]. Available: https: //www.youtube.com/watch?v= cQITY0SPiw
[12] G. Webster, R. Creemers, P. Triolo, and E. Kania, “Full translation: China’s ‘new generation artificial intelligence development plan’,” 2017. [Online]. Available: https://www.newamerica.org/cybersecurity-initiative/digichina/blog/ full-translation-chinas-new-generation-artificial-intelligence-development-plan-2017/
[13] M. Sadler and N. Regan, Game Changer: AlphaZero’s Groundbreaking Chess Strategies and the Promise of AI. New in Chess, 2019. [Online]. Available: https://www.newinchess.com/en US/game-changer
[14] L. Fridman, “David silver: Alphago, alphazero, and deep reinforcement learning — ai podcast #86 with lex fridman,” Youtube, apr 2020. [Online]. Available: https://www.youtube.com/watch?v=uPUEq8d73JI&t=2s
[15] L. C. Z. open source community, “Leela chess zero,” 2020, [Online; accessed 04-June-2020]. [Online]. Available: https://lczero.org/
[16] Y. LeCun and G. Marcus, “Debate: ”does ai need more innate machinery?”,” Youtube, oct 2017. [Online]. Available: https://www. youtube.com/watch?v=vdWPQ6iAkT4&t=227s
[17] Y. Bengio, “From system 1 deep learning to system 2 deep learning,” dec 2019. [Online]. Available: https://slideslive.com/38922304
[18] Y. Bengio and G. Marcus, “Debate : The best way forward for ai,” MONTREAL.AI, dec 2019. [Online]. Available: https: //montrealartificialintelligence.com/aidebate/
[19] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Commun. ACM, vol. 60, no. 6, p. 84–90, May 2017. [Online]. Available: https://doi.org/10.1145/3065386
[20] B. Scholkopf, F. Locatello, S. Bauer, N. R. Ke, N. Kalchbrenner, A. Goyal, ¨ and Y. Bengio, “Towards causal representation learning,” 2021.
[21] J. Pearl, ““radical empiricism and machine learning research,” causal analysis in theory and practice (blog),” 2020.
[22] N. Ensmenger, “Is chess the drosophila artificial intelligence? a social history of an algorithm.” Social studies of science, vol. 42, pp. 5–30, 02 2012.
[23] A. Newell, J. C. Shaw, and H. A. Simon, “Chess-playing programs and the problem of complexity,” IBM J. Res. Dev., vol. 2, pp. 320–335, 1958.
[24] C. E. Shannon, “A chess-playing machine,” Scientific American, vol. 182, pp. 48–51, 1950.
[25] Chessprogramming contributors, “Zugzwang — chessprogramming wiki,” 2020, [Online; accessed 4-June-2020]. [Online]. Available: https: //www.chessprogramming.org/index.php?title=Zugzwang&oldid=14531
[26] G. Haworth, H. Van der Heijden, and E. Bleicher, “Zugzwangs in chess studies,” ICGA journal, vol. 34, pp. 82–88, 06 2011.
[27] E. Bleicher, “Building chess endgame databases for positions with many pieces using a-priori information,” 2004. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.217.3356
[28] ——, “Freezerchess,” 2020, [Online; accessed 04-June-2020]. [Online]. Available: http://www.freezerchess.com/
[29] Chessprogramming contributors, “Endgame tablebases — chessprogramming wiki,” 2020, [Online; accessed 4-June-2020]. [Online]. Available: https://www.chessprogramming.org/index.php?title=Endgame Tablebases&oldid=18896
[30] ——, “Evaluation — chessprogramming wiki,” 2020, [Online; accessed 4-June-2020]. [Online]. Available: https://www.chessprogramming.org/ Evaluation
[31] ——, “Minimax — chessprogramming wiki,” 2020, [Online; accessed 4-June-2020]. [Online]. Available: https://www.chessprogramming.org/ index.php?title=Minimax&oldid=13281
[32] M. Guid and I. Bratko, “Detecting fortresses in chess,” Elektrotehniski vestnik (English Edition), vol. 79, pp. 35–40, 01 2012.
[33] Chessprogramming contributors, “Alpha-beta — chessprogramming wiki,” 2020, [Online; accessed 4-June-2020]. [Online]. Available: https:// www.chessprogramming.org/index.php?title=Alpha-Beta&oldid=16953
[34] ——, “Null-move heuristic — wikipedia,” 2020, [Online; accessed 4-June2020]. [Online]. Available: https://en.wikipedia.org/wiki/Null-move heuristic
[35] ——, “Ply - chessprogramming wiki,” 2020, [Online; accessed 4-June-2020]. [Online]. Available: https://www.chessprogramming.org/ index.php?title=Ply&oldid=160
[36] J. Ellis, “Crystal chess engine,” 2020, [Online; accessed 4-June-2020]. [Online]. Available: https://github.com/jhellis3/Stockfish/tree/crystal
[37] M. Byrne, “Honey-xi-rl releases,” 2020. [Online]. Available: https: //github.com/MichaelB7/Stockfish/releases/tag/Xi-r1
[38] various, “What is the strongest tactical engine with fortress detection?” 2020. [Online]. Available: http://www.talkchess.com/forum3/viewtopic. php?f=2&t=73786&hilit=Crystal
[39] H. M. Le, N. Jiang, A. Agarwal, M. Dud´ık, Y. Yue, and H. Daume,´ “Hierarchical imitation and reinforcement learning,” in ICML, 2018.
[40] H. Forsten, “Moves left head pull request - github,” mar 2020. [Online]. ´ Available: https://github.com/LeelaChessZero/lc0/pull/961
[41] Ipmanchess, “Lc0: why play for a mate if i can get a extra 50moves?!” 2020. [Online]. Available: https://github.com/LeelaChessZero/lc0/issues/ 688
[42] crem, “Tablebase support and leela weirdness in endgame,” oct 2018. [Online]. Available: https://lczero.org/blog/2018/08/ tablebase-support-and-leela-weirdness/
[43] M. Garcia, “Selection of studies of about 85 edngames composed by from markovich simkovich,” 2020. [Online]. Available: http://www.arves.org/arves/index.php/en/endgamestudies/ studies-by-composer/560-simkovich-from-1896-1945

Chess fortresses, a causal test for state of the art

Symbolic[Neuro] architectures - Supplementary Material