An overview on Finite State Machines and more specifically, how they are used in the Batman: Arkham series.
Lunatics Taking Over the Asylum
Typically the AI and Games articles on this site are focussed on telling a particular story. This can range from discussing how certain AI techniques have been adopted in commercial games, to the challenges of experimenting with new techniques for challenging and diverse domains. However, this article started out a little differently. What follows was driven more by my wondering: “how exactly does this work?” And by ‘this’ I mean the AI implementation behind the Batman: Arkham series by Rocksteady Studios.
Through some investigation I came to something of an informed conclusion, which results in a nice means to highlight how even simple, hand-crafted AI can lead to an engaging experience for players.
The Batman: Arkham series, at the time of writing, spans across four different games:
- Arkham Asylum (2009) – PC, PS3, Xbox 360
- Arkham City (2011) – PC, PS3, Xbox 360, WiiU
- Arkham Origins (2013) – PC, PS3, Xbox 360, WiiU – (developed by Warner Bros. Montreal and Splash Damage).
- Arkham Knight (2014) – PC, PS4, Xbox One.
While the games allow you to experience many of the different aspects of being the Batman, such as conducting investigations and solving crimes, the gameplay focus is on two particular types of gameplay: combat and stealth.
During combat sequences, all if not the vast majority of enemy characters are either unarmed or at best are carrying blunt objects such as metal pipes or wielding knives. The focus of these sequences is to eliminate often a large number of foes as quickly as possible while maintaining a rythym to your movement. This requires an element of skill and precision from the player.
Meanwhile the stealth segments are focussed on dealing with large numbers of armed enemies. Given that Batman is not bullet-proof, he must focus on disabling enemies individually. This often requires use of the environment to catch enemy characters unaware and leave traps for those who will soon discover their unconscious allies.
As a player, one thing I have really enjoyed about these games is often how seamlessly it works. The subsequent games in the series have merely added to a formula that worked from the very first game. The player agency as well as the non-player character (NPC) behaviour maintain the immersion and the feeling of empowerment. However, this is often achieved without making the NPCs act as fools: they can quickly take you down in the combat modes if you are outnumbered and make too many mistakes. Meanwhile, in the stealth levels their guns give them a major advantage and they will take you out quickly if you leave yourself exposed.
One element that in hindsight is actually quite important is that the behaviours are often difficult to predict. Not only do characters not take predictable paths through the environment in the stealth segments, but their animations have been blended together rather well. You can’t figure out their subsequent behaviour simply by watching the animations either.
So the question this article attempts to answer is ‘how did they do it?’ Prior to investigating I concluded early on that the AI implementation would not be that advanced. That is not a slur against the developers, but rather an understanding that it doesn’t need to be that intelligent to facilitate the needs of the game. What I was interested in was what it was doing and how it was developed. Given that in games much of the implementation detail is hidden in smoke and mirrors, I wouldn’t be surprised if the AI benefits from looking smarter than it actually is.
As intimated by Tim Hanagan at the Game/AI conference in 2012, the game is built off relatively simple methods that are iteratively tested and refined. In addition, one of the most interesting elements is how involved the AI team are.
Looking specifically at the development of Arkham Asylum and Arkham City at Rocksteady Studios, the AI team are focussed on rapid prototyping and iterative development: designing new concepts for the game, scrapping those that don’t work and then refining those that do. From a development perspective, one of the most interesting elements is that outside of the design leads making the final decision on what stays and what is cut from the game, the AI developers are primarily responsible for combat and stealth mechanics. This is quite an interesting approach, given that it allows for a more streamlined development of the opponents next to the mechanics designed to take them down. It also refers back to the idea of iterative development and rapid prototyping: given new ideas for AI behaviour can be adopted in the game and tested quickly, since the programmers responsible for the mechanics are ultimately the same team. The AI team (which grew from 5 to 16 developers between Arkham Asylum and Arkham City) were largely responsible for ‘finding the fun’, repeatedly tweaking parameters in order to create the most engaging experience. This ranged from the timing or strength of individual mechanics to the timing needed to take down bosses such as Bane, Solomon Grundy or Ra’s Al Ghul.
Of course developing the game mechanics often requires more than just the programmers, given that the visual aesthetic must be maintained. As mentioned earlier, the animations of the characters are implemented well and make it difficult to determine when a NPC will conduct a behaviour. It seems that this is a direct result of not only that the animation team sit directly across from the AI team at Rocksteady, but often when implementing new features (using an agile development approach I suspect), an animator and AI programmer will take charge of that new feature and see it through to it’s conclusion.
As discussed in the Game/AI conference talk (Hingston, 2012), Rocksteady do not use behaviour trees, which have previously been adopted in Halo (Isla, 2005). In addition they do not utilise more advanced methods such as automated planning, which has been seen in games such as F.E.A.R and Transformers: Fall of Cybertron Instead, it is built entirely off of finite state machines.
An Introduction to Finite State Machines
The Finite State Machine (FSM), or Finite State Automaton (FSA) is a popular method for building simple behaviours for NPCs in video games. The original concept is a mathematical model for the theory of computation, applicable not only to software but also logic circuits. A simple (deterministic) FSM is reliant upon two key elements:
- A finite number of states, which the FSM can find itself in.
- A state transition system, which dictates if we are in a given state S and a particular signal is passed to that state, then a successor state will be reached.
NOTE: that this is a simplification of the mathematical model of FSMs. As such, if you wish to read about this in further detail I would consult the Wikipedia article (which is rather useful for a Wiki). However, if you are interested in this at a more scholarly level, I would consider looking at (Hopcroft et al., 2013) as an introduction to automata theory.
While at present this all seems rather bland, what it gives is a method of dictating how we model the behaviour of a system, in this case a NPC for a game. We can observe what states the NPC will be in and how those states will change over time. We often visualise a FSM using a something akin to a flowchart diagram. A very simple example is given in the diagram below, with two states and signals coming from events that trigger transitions.
When creating a FSM for a game, the state dictates what the character will do at any point in time. While this can be seem immediately applicable for Move or Attack states, there are even issues to consider when the NPC is in an Idle state. If idle, the character will typically need to run through one or more animations. This is to indicate that the character is doing ‘nothing’ in terms of behaviour, but is still active in the game: a static NPC may break the immersion the developers sought since it may look inhuman in nature. Otherwise, the state will dictate what movement, animations and interactions with the game world must be conducted.
As for moving between states, the reasons for change can vary, but it is typically due to an event occurring within the game. The simplest of these can be the result of a random number generator or a number of frames passing in the game. In more complex cases, this can be the result of a NPC seeing an event occur, a sensor returning a value, or a combination of events occurring within a particular time-frame. Now the important part to remember is that the state transition system dictates whether a transition will occur based upon the incoming stimulus. So when in a given state, a character may well ignore some signals being received, but act upon others.
An Example – The Ghosts in Pac-Man
To put this into context, let’s look at our old favourite (Ms.) Pac-Man. We can effectively model each ghost NPC using a Finite State Machine as shown in the diagram below.
In this instance, we have separated the behaviour into two unique states:
- Chase: Hunting Pac-Man with the intent to kill him.
- Evade: Running away from Pac-Man who has consumed a power pill, meaning the ghost is open to attack.
All of the ghosts from both Pac-Man and Ms. Pac-Man conform to this model, since they will attempt to minimise their distance to the player be default, then attempt to maximise their distance if they are under threat of being consumed. As mentioned before, each of the ghosts behave differently, but if we consider them more abstractly, all they do is chase and evade. How each of these ghosts act in those states differs, but they are ultimately doing the same thing.
From an implementation perspective, this often useful, given we can create hierarchies of NPCs for games that adopt the same FSM, but their corresponding actions within a given state can differ.
It is often wise to keep your FSM implementation as abstract as possible. This not only prevents the model being bogged down with very context-specific states, but allows for greater flexibility in how states are interpreted in runtime behaviour. Our article discussing the AI of Monolith’s F.E.A.R. is a good example of this.
Note: Further reading on FSMs applied to games can be found in Section 5 of (Rabin, 2002).
The AI of Arkham
As mentioned earlier, the vast majority of the AI implementation found in the Arkham games comes directly from use of FSMs. This is in fact a very common practice, given that FSMs, or something akin to them, have been adopted in video games for many years and – until arguably the mid 2000’s – were still the default approach to creating behaviour in NPCs. So how exactly does it work? The truth is we don’t have a complete overview of what these FSMs may look like, but we can approach it not only looking at individual cases, but also what that ultimately boils down to.
Looking at the stealth segments, Batman is trying to stay hidden from anywhere from four to eight thugs who are patrolling a particular area. From observing their behaviour, thugs either patrol the map freely, often moving to particular areas of interest that may leave them exposed, or they stay fixed at a certain position. The latter is a rarity and is driven primarily to aid the player in becoming accustom to using a particular tool or feature. For example, as shown in the accompanying video, some characters will stay near walls that can be destroyed with explosive gel to give the player the opportunity to experiment with this technique.
It is when we assess non-player characters in particular scenarios that the implementation is evident. One of the most common cases that we will see is when you, as Batman, take out one of the thugs unnoticed. The other thugs will recognise that something has occurred and begin to search for the disturbance that has occurred. So if we take it from the position of the NPC who discovers the injured thug first, often we see a very finely tuned behaviour occur:
- The thug will start moving around the map: often to an area within proximity of where the player attacked the thug.
- Once line of sight is established, the thug will call to his team to announce that he has found someone who is injured. Thug will then move towards the injured character.
- Upon examining the unconscious thug, often a statement is made to the others: instructing them to fan out and search for fear of their own fate.
- Conduct a search pattern of the local area.
Now if we were to distill this down to a FSM that encapsulates the core behaviours here, we may create something akin to the diagram below.
If we look at this FSM, there are some interesting issues that need to be addressed:
- The FSM is very specific and only focusses on this particular use case of the thug responding to a recent attack.
- Many of the transitions that occur between states are the result of a concluded animation sequence. Many of these states are utilising a pre-built animation in order to lend credibility to what is happening.
- Outside of these animation-driven states, the rest of the states are actually reliant upon path-finding: as the bot in question attempts to navigate the environment.
The point being made here, is that ultimately we could reduce the existing FSM, which is really not that large, into something much smaller. As shown below.
This seems ridiculously small, but is actually practical. All of what we see in the stealth segments can be distilled down to movement and animations. Even attacking is an animation, given it is the firing of a weapon. Some of the animations seen in the game are context dependant, such as leaning or jumping over a railing, or climbing a ladder. However this still fits given that these will be identified as areas of importance in the environment to move towards. Hence the bot will know to move towards a node in the map to allow it to run this animation.
This is a great example of creating AI for games that pretends to be clever and – by and large – fools the player into thinking they are more intelligent than they are. This is achieved largely from dialogue, since it appears that the NPCs are talking to one another to guide their behaviour. In addition, the animation aids this significantly as once again it reinforces what the character is ‘feeling’ or is intent on achieving. In one sense these mask the implementation of the AI from the player. On the other hand, it is a great example of ‘showing your working’ as it makes it clear to the player how the AI actually works.
Next to the AI in the combat mode, which is arguably less interesting, since it is more a product of repeatedly polished features. None of the individual characters are uniquely intelligent, instead there is a mixture of small behaviours that are driven by a global system that is overseeing the combat. There is one key mandate that is managed at this higher level, and was even intimated in the AI/Game conference (Hanagan, 2012): only one character is selected to attack Batman at a given time.
In the original Arkham Aslyum, this was far stricter, with only one character attacking Batman for a certain amount of time. In Arkham City this has became more granular, given that multiple thugs can attack Batman given his ability to parry up to three enemies at once. The key part is that attacks are not designed to be unfair: you can always find a way out of a circumstance lest it is one you have created by gambling for a higher score. For example, you will not be overwhelmed with enemies attempting to attack you at once if you cannot parry all of them. However, if you attempt a ground takedown, you can be left exposed to attack and may suffer if you do not have sufficient time to complete the move. In this instance, it is the failings of the player, not the AI, that has led to damage being taken.
Outside of basic movement towards Batman and attempting an attack, NPCs also have the option of grabbing nearby weapons. This is either by picking up a weapon that has been knocked aside as Batman took out the original owner or by retrieving them from an available location, such as a weapon locker. Ultimately it all boils down to what we have already described: moving to locations and playing animations.
More than State Machines
It’s important to appreciate the efficiency of the implementation behind these state machines. Make no mistake that this article does not demean this work by pointing it is ‘just’ a finite state machine. If anything, it’s praise of what is ultimately a finely tuned system. As mentioned earlier, Rocksteady are serious about their rapid prototyping and iterative development process and the final experience, particularly in stealth sequences, is a testament to that effort. However, it is not all about the FSM implementation, a lot of the experience gained has been refined from repeated playtesting both internally (where the worst gamers on the team were used to gauge baseline difficulty), as well as by Quality Assurance (QA) teams that Warner Bros. provided. As a result, there are some interesting design decisions and implementation features mentioned in (Hanagan, 2012) that have helped ensure that the experience is as fun as possible:
- Pathfinding Does Not Backtrack: During stealth sequences the characters will seldom turn around to pathfind towards a location they recently visited. This is to ensure that the player does not become frustrated having crept up behind an enemy, only for them to turn and shoot you in the head.
- Stealth Mechanics Trimmed Down: The dearth of mechanics originally implemented meant that almost every stealth encounter in Arkham City required something new to be learned by the player. This was then revamped to try to streamline the gameplay.
- Detective Vision Derived from Debug Mode: The stealth segments were originally tested using a debug mode in the developer build. However, the team felt that the game seemed less fun when all of the debug information about each of the NPCs was removed. As a result, they introduced it as a mechanic which proved intrinsic to the game.
The Arkham series and the work largely developed by Rocksteady is a strong example of how to achieve an intelligent characters using Finite State Machines. While less popular in more recent years, the Arkham series is a a great example of how to create finely tuned, challenging and fair AI to be built in a game by iterative development and continually tweaking the system to create something that is both scalable for the developer and interesting for the player.
(Hanagan, T., 2012) – How to Have Fun in an Asylum – AI Game Conference , Vienna. AIGameDev.com
(Hopcroft, J.E., Motwani, R. and Ullman, J.D., 2013) Introduction to Automata Theory, Languages and Computation (3rd ed.) Pearson. ISBN 1292039051.
(Isla, D., 2005) Handling Complexity in the Halo 2 AI. Proceedings of the Game Developers Conference (GDC) 2005.
(Rabin, S. (Ed.), 2002). AI game programming wisdom. Cengage Learning.