The computer does what you tell it to do
2024/08/05
When I was in school I heard some variant of “the computer does what you tell it to do” often. This was long before before large language models had broke into the news so for most people computers were still just in the realm of preprogrammed behaviour.
However, as a programmer, you quickly learn that even the most basic preprogrammed behaviour suffers the same issue. You program the computer to do something step by step, it does exactly that and that wasn’t quite what you wanted, or maybe it was what you wanted but you give the software to someone else and they think it is meant to be doing something slightly different. Typing “please show me why alignment problems in computer science are considered an existential risk” into a search engine will probably give you exactly what you asked for, but you would probably get better results by just typing in the keywords from that sentence. A user could be fooled into thinking the search term should be what they want to ask, not a filter to return content in the database that contain the same words as they typed in. This misunderstanding is easily corrected when trying to use very dumb search engines like Reddit’s.
Artificial Intelligence is considered by many to have enormous potential in the next few decades, however many experts think there could also be severe risks. Yet this blog post isn’t about sophisticated AI. The aforementioned 80000 hours webpage details this problem in lots of depth and I’d highly recommend reading it.
No, I was reminded that “the computer does what you tell it to do” because I was trying to work out why the extremely simple minmax algorithm I wrote for a game wasn’t working.
I started out with writing the well understood and well documented alpha beta pruning min max algorithm. The problem was simple enough, I wanted to program a bot to play Hnefatafl and had already implemented code that would return a list of every possible move the turn player in the board game could take. I simply needed to write a function that would take this game state and return the ‘best’ move that should be played.
By having it loop through and evaluate all possible moves rather than simply produce one I’d already eliminated a possible alignment issue. A human can attempt to make an illegal move in a board game they don’t fully understand, however in a computerised version of a board game the interface can prevent them from selecting an illegal move. Similarly, if a bot tries to choose a move it can’t make then it would be stuck. It would even be possible to make the AI graciously avoid infinite loops with the player if a board game reached a state where both the computer and human were stuck repeating the same cycle of moves - just detect this cycle and remove that offending move from the list that the AI will pick from.
Obviously, I thought to myself, the best move would be one that immediately wins the game. However it is not possible to win the game in just a few turns in a tafl game. As the side trying to free the king, you must reach a corner of the board, and both your own pieces and the opponent’s block your path. As the side trying to capture the king, you must surround it on all sides, which takes multiple turns even once it is exposed. Since the minmax algorithm even with alpha beta pruning could only look ahead by 3 turns before the exponential number of moves made the algorithm too slow to play against in real time I had to implement a way to evaluate game states where victory was not possible. I figured a good proxy for victory would be capturing the opponent’s pieces. A simple my pieces - opponent's pieces
heuristic tells the minmax algorithm that a 1 for 1
trade is worth as much as no trade at all, and a 2 for 1
trade or a 1 for 0
trade have the same value. These trades of pieces require far less lookahead to see. Even with a depth of only 3 turns, playing against the computer frequently had me losing pieces that I did not forsee the computer capturing.
However, while the minmax algorithm with these heuristics was very good at identifying ‘free’ captures of my pieces, it was strikingly bad at winning the game. It could leave open a corner of the board that would require only two turns for the king to reach and win, or I could pin the attacking pieces down to give my king an open path in such a way that it could have easily prevented if it had only looked ahead to two of my turns. What gives? It clearly liked capturing my pieces, why was it so short sighted?
After some time debugging, I realised this was the consequence of setting the look ahead to 3 turns. It evaluated its turn as the turn player, considered my best options (according to its own heuristic), then looked at what it could do after, and totally neglected to consider if that third turn ahead left either side about to win the game. Raising the lookahead made the algorithm play ‘properly’ as the king’s side, and so I tweaked the code to check for imminent victory or loss on the third turn to avoid having to raise the look ahead to 4.
However, after adding UI support to start the game with my playing the attacking side and it trying to free the king, it seemed very reluctant to actually move the king to a corner and win. I was intentionally trying to throw games and leave multiple corners wide open for it to move the king to, yet it would just not take the victory. It still liked capturing my pieces, even with a bigger reward available to it. What gives?
After more time debugging I realised I had made an error in the heuristic. The minmax algorithm did not value winning now, it only valued winning. As long as it had a means to win, if the other player couldn’t close that path to victory, it didn’t value taking that win on its immediate turn any more than on its next turn. By throwing a game intentionally, I had put it in such an overwhelming position of victory that it would just not bother actually taking the win. Of course eventually it would take the win, it did value winning, but Hnefatafl is an asymmetric game. For two humans, it may be played as a best of two, with a tie break based on who won their game in fewer turns. I had failed to include this in the heuristic.
Adding a penalty to the heuristic so that winning on turn 3 was less valuable than the current turn, the bot finally worked as I had expected it to, playing with very minimal foresight, and when pieces were not available to capture, just taking random moves rather than playing towards setting up a victory later, but competently enough that if I don’t pay attention it would beat me.
I had implemented the min max algorithm correctly all along, it simply valued what I told it to, and looked ahead as much as I told it to. I programmed it to play Hnefatafl, but it was not programmed to play the game like a human would.
I suspected when initially programming the min max algorithm that the heuristic to value capturing the opponent’s pieces may cause it to lose games it could have won. I still suspect that when I implement a neural net to pick the ‘best’ move that neural net will end up valuing sacrificial plays that lead to victory, potentially at the expense of the minmax algorithm valuing piece advantage. It’s perhaps typical of AI that the one thing that I thought might cause the minmax algorithm to not value playing Hnefatafl properly was the one part of its implementation that wasn’t causing issues yet.
The computer does exactly what you tell it to do, even when those instructions are themselves instructions on how it should work out what to do.