A cautionary tale of adopting machine learning algorithms, based on our earliest experiments back in 2005.
While articles on the AI of F.E.A.R. and Transformers have focussed on the de-construction of existing commercial titles, for this occasion we decided to look back on an early project of mine: discussing the successes and failures that came along the way. Given that we discuss how machine learning algorithms can be employed for Non-Player Characters (NPCs) as part of my game AI class, I felt it appropriate to use some of my own experiences to ensure that beginners – notably my own students – do not make the same mistakes.
Setting the Scene
It is the Autumn of 2004, George Bush has bested John Kerry in the US election, Greece have recently won the European Cup after narrowly defeating hosts Portugal, Ireland has just introduced a smoking ban and people are still reluctant to join this new social media site called Facebook. The focus of this study, i.e. me, was about to begin his final year of undergraduate study; still struggling to acclimatise to his recent return to Scotland having studied in America, young Tommy was a touch bewildered. You could also put it down to him being 20 years old! Fashion sense (or even a modicum of respect) had not yet graced his precious little mind; given he looked like a contender in an Axl Rose lookalike competition.
It is at that point he meets Dr John Levine, a senior lecturer at the University of Strathclyde. Dr Levine has an interesting project he has proposed as a dissertation. It revolved around learning controllers for a top-down shooting game.
The game, entitled EvoTanks, is a top-down shooter inspired by the Atari 2600 game Combat. The game focusses solely on 1-vs-1 combat, where each tank has unlimited shells and four shield points. (unlike the original where it was a one-hit kill). The first combatant that can deplete the health of their opponent wins. There is a timer that, once reduced to zero, will declare the match a draw.
The focus of the project was to learn an optimal behaviour for the player’s tank.
Sure, it was not the most exciting game I have ever been involved in the development of. However, what was interesting is that the focus of the project was not to write software that would enable each NPC to act independently, rather we were looking at how to learn an optimal behaviour for the player’s tank.
The Idea of a Tank with a Brain
The project focussed on creating a tank NPC that was driven by an Artificial Neural Network (ANN). An ANN is a non-linear data structure that, to some extent, mimics the function of the human brain. Each contains layers of small processing units, called neurons, which are connected to one another via synapses. Each neuron receives the weighted summation of all data that is passed into it via the synapses. Provided that this sum reaches some threshold level, the neuron will ‘fire’: passing a signal along the outgoing synapses to all neurons in the subsequent layer. Some layers are connected directly to input from sensors or other methods, some to output points that can then be used as an outcome of the networks processing. Other layers will be ‘hidden’, in the network: meaning they are part of the processing of information between input and output layers (Russell and Norvig, 2003).
The key design components of ANN is that we not only can customise the topology, but the synapses carry a weight. We can tailor the output of the network by either shifting the topology and/or changing the weight on the synapses. This makes even a simple network such as the one shown in this diagram, very customisable and adaptable for purposes of NPC control, as we will see shortly.
These design principles lead to two key benefits that an ANN can provide for any decision making process:
- We Can Apply ANN’s For Function Approximation: provided we have tailored the network topology and weights correctly, we can use it to calculate the desired output from a given input. This is useful in creating a general behaviour that knows the right decision to make at each frame of the game.
- They Are Computationally Inexpensive: ANNs cost very little CPU and memory resource to run.
Of course the key to success is mentioned above: we need to have tailored the network such that it can successfully model the input sensors and actuators for a tank NPC, as well as ensure that the weights and topology mean that the behaviour is intelligent.
Artificial Neural Networks cost very little CPU and memory resource to run.
In this project we set on a fixed topology. However, the weights could be customised so that, depending on the input received from sensors, we could provide output that would translate into behaviour for the tank. We started with a simple collection of inputs and outputs:
- The distance to the nearest enemy.
- The angle to the nearest enemy (in polar coordinates).
- The angle of the nearest enemy relative to the player.
- Movement: forward with positive output or backwards with negative output.
- Rotation: clockwise with positive output, counter-clockwise with negative output.
- Fire: threshold dictated whether the cannon should fire.
We figured with this set of sensors and actuators, we could create a character that could recognise nearby opponents, figure out the angle needed to turn to face them and then fire the cannon (Note: the cannon dependant upon the direction of the tank, like old-school World War I tanks). However, there are subtleties to this design that needed refined in order for it to work correctly, as we will see later.
We Have The Technology…
While we could build the brain as a neural network, we still needed to tailor the networks such that the NPC could actually do something intelligent. To do this, we built an evolutionary algorithm (EA): this would allow us to actually make something interesting that could satisfy the needs of the project.
An evolutionary algorithm, such as a standard genetic algorithm, uses darwinian principles to find good solutions to problems. While there are many sources better suited to covering this in detail, notably the likes of (Mitchell, 1998) which is a strong introductory text to the subject area, I will briefly summarise as follows:
- We create a population of candidate solutions. Each candidate is a potential solution to the problem we have in mind (being a useful NPC combatant).
- Each candidate is a representation of the neural network for the tank NPC.
- We assess each candidate in turn by testing them against some criteria in-game to determine their fitness.
- Having completed this assessment, we then select a subset of candidates from that round, influenced by their fitness.
- We then allow them to breed; utilising mechanics such as crossover and mutation to create new candidates (children of the parent species).
- This completes one generation of the process: we repeat for a number of specified generations.
To summarise, you breed small collections of solutions to the problem. In this instance, we are breeding little tank brains. It sounds rather ridiculous the first time you stumble across it, but in actual fact evolutionary algorithms are often used in a range of complex optimisation problems. There are a number of reasons for this:
- EA’s search for the ‘fitness landscape’ within the parameter-space: What does that mean? It means that based on the all potential values we can set for each solution, we are assessing them in context of how well they perform against the task at hand.
- While searching fitness landscapes is hard, EA’s can find good solutions that map onto it: This Darwinian approach to learning combined with use of operations such as crossover, mutation, elitism (retaining the best stuff we find), can help push us in the right direction.
- EA’s are great when we do not know the total number of states or there are simply too many to model succinctly: This is why an EA was chosen for EvoTanks. Even a small game like this has a large number of unique states that need to be quantified in some computationally tractable manner. Instead we can leave the algorithm to find solutions that work well in this space without worrying about how to model it.
- The emphasis is on the quality of the behaviour: When evolving solutions, we are using the fitness criteria to dictate what it does. Sure, it’s actually assessing how well the solution fits the intended goal, but that both explicitly and implicitly dictates the behaviours we create. Provided we designed the algorithm correctly, we should see some interesting (albeit probably not optimal) behaviours pretty quickly.
Evolving a Tank in 3 Easy Steps
So what is the recipe to evolve a tank NPC? Now, I can distill this down to a number of simple steps. However it took some time to truly appreciate how to make this work correctly:
Step 1: Write a Simple Genotype for the Solution
I mentioned earlier that to evolve we need a representation, often referred to as chromosomes. This representation (the genotype) maps to the tank controller (the phenotype), but provides a succinct description for the search as to what this tank looks like.
Some of you will note the continuous use of biology terminology here. The approach is trying to mimic natural evolution: chromosomes are the structure of proteins stored in cells and in the case of humans, DNA is structured within chromosomes. Given that DNA is what makes you, you (and me, me), it is the genotype of you. You are the phenotype, given you are the actual product of that genetic representation.
Since we are using neural networks and the topology and weights of synapses are customisable, we can use this information as the genotype. In the case of this project, we elected to only evolve the weights of the controller and not the topology (it was a dissertation after all, I was sort of rushing to get this to work). As such, the genotype was the complete set of weights for the neural network we were evolving, resulting in a chromosome of around 40-50 double-precision values.
Step 2: Find an Appropriate Fitness Function
To evaluate each candidate solution, we need to assess it against what we think is the desired behaviour. In this case we create an equation that said the closer you were to beating the enemy, the better the score. Now we tweaked this over time to be more useful for a learning algorithm, which I explain later.
Step 3: Create an Evaluation Framework
Each candidate would have the weights injected in the neural network, which would then be given control of the NPC. We then trained it against one of our pre-written NPCs for around 10 iterations:
- Hunter: Kamikaze tank, heads straight for the player and attacks.
- Sniper: Keeps at a distance and takes shots at the player.
In time we added to this roster, since we could not get any of the basic behaviours to work.
Well, I Thought That Would Do It
Of course it didn’t work out as easily as that. In fact, for the first two months of the project – while weaving between it and my in-class assignments – the only result we could get was a tank that drove around in circles. It wouldn’t even shoot the cannon!
This led to the title of the article: in a moment of desperation, I created two new NPCs:
- The Lazy Tank: A tank that does not move, but fires its cannon periodically.
- The Sitting Duck: As the name implies, does absolutely nothing.
Still even with these NPCs, the character couldn’t even face the enemy and take a shot at them. To say I was getting worried at this point was an understatement. One infamous experiment had the Sitting Duck fill so much of the environment that should the evolving tanks shoot even in the general direction of the Duck it would die instantly. Sadly, we couldn’t even get this to work.
Some Useful Design Decisions
John advised me to get in touch with a PhD student at the University of Edinburgh: Georgios Yannakakis (who is now an established researcher in the area of AI and games), for whom John was second supervisor. Georgios had a lot more experience in this area and I asked for some advice.
He made quite a few key observations that, only through further study during the dissertation and my subsequent year at Edinburgh as a Masters student, I truly appreciated.
- The Inputs Need to be Normalised: The data being sent through via sensors was more or less raw input. This isn’t ideal for a learning system, since the range of outputs will be expansive. Hence we constrain the distance values to between 0 (very far away) and 1 (he’s on top of you) or ±1 for angles to denote left or right. This also pre-processes the information to give us some implicit knowledge.
- The Weights of the Network Must Be Constrained: Strong positive or negative signals in a neural network can have a huge impact on the resulting behaviour. Hence each node in the neural network should be constrained to maximum and minimum value. We kept it at ±5 in this instance.
- We Must Ensure The Candidate Is Assessed Thoroughly: Given that there is an element of randomness and context to gameplay, it’s best to run tests 10, 20 even 50 times to ensure that the (average) fitness recorded is an accurate representation. After all, the evolving tank may get lucky and that can lead to a skewed result if we only assess a handful of times.
- The Fitness Function Should Promote Growth: Our original fitness function was based entirely on how quickly the bot could beat the enemy player. However it gave very little information as to why that behaviour attained that grade. In addition, losing behaviours were not really awarded for their performance (given they often timed out). Hence we changed the fitness functions to promote efficiency in victory and survival in the event of a loss or draw.
With this knowledge in mind, we tried again with the same experiments. We saw an immediate improvement in the behaviour of the NPCs, resulting in competent behaviours against a range of different enemy bots.
The Problem Of Convergence
One of the benefits of using an evolutionary algorithm is that it allows us to sit back and watch the learning algorithm solve the problem – provided we designed the learning algorithm correctly.
However, there is also the issue of what the learning algorithm is training against. If we are not careful, we can quite easily back ourselves into a corner: creating what appears to be an optimal solution, but only for the problem at hand. Earlier in the article I mentioned that we were training against a particular scripted NPC. While we were now capable of learning to solve that particular problem the issue now is that it would result in the solutions being trapped in local maxima.
If you consider the image above, the best evolved tanks were trapped in a local maxima, which is representative of the best possible tanks we could evolve against a particular scripted NPC. The problem is that if you were to put that evolved tank in a game against any other NPC – one that it was not trained against – it would perform sub-optimally. Why? Because the learning algorithm was tasked to learn for one problem, yet you are now asking the trained tank to solve a different one. Ideally, we would prefer that the evolved solution is a generalist which achieves close to the global maximum, such that it can fight against any NPC we create with a reasonable level of performance.
The Solution – Generalised Gameplay (and a Change of Scenery)
The latter part wasn’t really a requirement of the project. Instead, it was a change in personal circumstance, as I moved to Edinburgh (at John’s behest) to study a Masters degree in Artificial Intelligence at the School of Informatics: which is sort of renowned for this sort of thing. It was during that time much of this advice I impart now began to solidify in my mind. It was also where we continued this work to reach its’ somewhat natural conclusion as a Masters dissertation under the supervision of Dr. Gillian Hayes.
We wanted to veer away from the issues mentioned earlier, whereby the evolved agents were becoming trapped in local maxima. To do this, we decided to use a different approach to the learning process. Rather than have the learning algorithm focus on individual scripts, we make the evolving agents learn by playing against each other.
This approach, known as co-evolution, changes the evaluation model so that evolving solutions play against one another rather than scripted NPCs. So how does this actually result in any improvement? Well, as discussed at length in (Mitchell, 1998), this can create what is referred to in biology as an ‘evolutionary arms race ‘: since these NPCs are being trained against one another, the learning algorithm will find unique strategies in some candidates that can overwhelm their opponents, and spread that knowledge throughout the population. However, in time the population will learn defences that help them to overcome these tactics.
To put this into some sort of context people might understand: who remembers ‘Goldeneye 007′ on the Nintendo 64? It had a four-player split-screen multiplayer mode, which often left you huddled round a small CRT television trying to play the game together and obeying the rules of your social circle (e.g. karate chop only, no proximity mines, no screen watching, etc.).
This is a small arms-race dynamic, since the more you play with your friends the more you learn their strategies: how they navigate the map, what their weapons of preference are, where they like to ‘camp’. This then informs your own behaviour. However you will be lucky if these strategies hold for more than a couple of games as the four of you will continue to evolve your play-style as the number of games you play increases.
The graph above shows that this co-evolving approach worked rather well. As more games are played, the average performance against a collection of selected opponents from within the population is pretty strong. The following graph conducts a bit of a reality check, since we need to know whether this fitness above, which is relative to the population, is actually a reflection of strong performance in the ‘real’ world. Hence we periodically throughout the learning process would play against the more challenging scripted agents (including a ‘Turret’ bot that would continually fire from a fixed position). While there is a slight dip, it does show that these co-evolved players were good at playing against the Hunter, Sniper and Turret NPCs.
Is this work in any way remarkable? Not really. If anything it holds a tiny amount of sentimental value for me given it was the first research project I conducted. However, what it did do is provide experience in the pitfalls of designing even the simplest of evolutionary algorithms. It now serves as a cautionary tale for those who are starting out. Should any of my own students start working on their own evolved critters, I hope this serves them well.
For further information about the EvoTanks project and a link to the toolkit (with source) used to teach undergraduate students: EvoTanks Project Page.
Thompson, T., Levine, J. and Hayes, G., (2007) EvoTanks: Co-Evolutionary Development of Game-playing Agents Computational Intelligence and Games, 2007. CIG 2007. IEEE Symposium on, 328-333.