This piece acts as the conclusion to the first series of AI 101. Throughout this series, we have migrated from first principles of agent-based theory, to the ideas of state spaces, search trees and uninformed algorithms. With part 10 we conclude the series by bringing us to informed search and specifically A* search. As detailed at the end of this article, future AI 101 series will be focussed on specific AI topics.
Introduction & Recap
Throughout the AI 101 series, we have aimed towards establishing the notion of rewards and how these can help guide or inform an autonomous agent to making rational decisions. In order to make rational decisions, we need to be able to conceptualise the current state of the problem we are trying to solve and imagine one or more goal states we would rather achieve. With this, we then search through the space of all potential states (i.e. the state space) by assuming the successor states that arise through execution of specific actions. Back in part 7, we identified a number of possible search algorithms. However, these were largely unintelligent, given that they explore the state space in a rigid and specific manner. In order to resolve this, we needed to return to the idea of reward, my identifying the potential of a given successor state through some form of metric.
In part 8, we started to look at the idea of costs, where we could attribute a resource that would be spent in the event we executed a specific action. Through this, we can begin to prioritise states to explore. However, prioritising exploration solely through cost can prove to be a hasty proposition, given while we may be minimising costs in the short-term, this can lead to sub-optimal solutions in the long term. This is because we are not considering the long-term effects of our short-term optimisations.
Lastly in part 9 we discussed the idea of heuristics: estimated metrics that tell us roughly how far away we are from the intended goal state. These can once again help make more intelligent decisions, given that the algorithm can prioritise states (and actions) based upon how much closer they will bring us to the goal. Despite this, relying solely on heuristics is still ill-advised: we can get a lot closer to the best solution out there, but it is still going to be sub-optimal in some cases.
With all of this we bring ourselves to this final part of the foundation series of AI 101. So far we have two intelligent processes that do not yield the best solution: one near-sighted, the other far sighted. We conclude the series by bringing these together and introducing the A* search algorithm.
In many respects we have already discussed much of the core elements of A*, but just haven’t put all of the pieces together. A* is considered one of the simplest, but also highly effective, search algorithms in artificial intelligence. It is used in a variety of contexts within video game development, and is typically used for problems such as path-finding: the task of having a character navigate from one place in the game world to another. Hopefully, you may now recognise that we have repeatedly shown examples of our cost- and heuristic-driven search algorithms in navigation problems (Kakariko Village and Romania respectfully). This is for two reasons: to help visualise the problem, but also to help make it easy to understand how this can be used for path-finding purposes.
In order for our algorithm to work, we adopt the same framework as our other informed search algorithms. The only difference is in how we evaluate each state of the problem. In part 8, we adopted the cost-driven method. If we were to evaluate the value of a given state ‘x’, which we will call f(x) and our costs are denoted as g(x) then the original cost-driven greedy search could be denoted like so:
f(x) = g(x)
Meaning we would use this equation to order each state based on the cost it takes to reach it. Meanwhile, if we were to denote our heuristic calculation of a given state ‘x’ as h(x), then the value of a state in the heuristic-driven search could be denoted as:
f(x) = h(x)
So for A*, all we do is glue these two parts together: identifying the value of a state as:
f(x) = g(x) + h(x)
i.e. ValueOfState(x) = CostToReach(x) + HeuristicValue(x)
We continue to adopt the same principle as before:
- Order States to Visit By ‘Value’: This time using the f(x)=g(x)+h(x) calculation.
- Ignore States Already Visited: Simply add them to our closed list. If we find them again on our travels, do not add them to the agenda.
So let’s see how this works in practice.
Returning to the Romania Problem
Back in part 9, we introduced the Romanian map problem: a common example of the use of intelligent search found in modern AI textbooks. One of the crucial aspects of this problem is that the optimal solution cannot be found by running solely on either the cost- or heuristic-driven methods. As a result, we need to use the A* search algorithm to find the optimal route. Let’s start by taking a look at the route that A* will actually take compared to the other two methods. As we can see in the diagram below, the three routes vary with the heuristic-driven and A* only slightly different from one another.
The three search algorithms vary in performance with the cost-driven method giving a very poor effort, given it wanders through the south-west corner of the map. Given it is aiming to minimise cost through the search process, it is rather ironic that it generates the longest route. Meanwhile, the heuristic-driven method is able to garner a much closer solution. This is because the heuristic helps identify states as potentially closer to the final goal. Why doesn’t it yield the optimal solution? The route from Sibiu to Bucharest differs given that the heuristic value of Fagaras is better than Rimnicu Vilcea. However, it fails to account for the cost of travelling from Sibiu to Bucharest via Fagaras. This winds up a total distance of 310km. However, when we begin to account not only for the cost but the heuristic value of a given state, then we discover that travelling via Rimnicu Vilcea is only 278km, which is 32km shorter! So ultimately, using both cost and heuristic value will yield the optimal solution to this problem. Ultimately, the cost and heuristic values compliment one another rather nicely, since the heuristics ignore the cost of travelling along the edges of the graph (i.e. the roads between cities), while the cost metric pays attention to the actual cost of reaching one city from another.
It is this balance between the cost and heuristics that allows it to find the optimal route via Rimnicu Vilcea and Pitesti. So what is happening here that allows it to spot that this is in fact the more optimal route? Well if we once again refer to the example from the Russell and Norvig textbook, we can see in the image below that at one point, the algorithm seriously considers Fagaras as the optimal route. The diagram below shows how we travel from Arad to Sibiu, then at first expand Rimnicu Vilcea. The reason for that is that the total value of the state is measured as 413 which is 2km shorter than Fagaras. As a result, it expands the children in Rimnicu Vilcea and assesses the successor states, including Pitesti. However, the value of Pitesti is deemed more expensive than Fagaras, so it will visit that location next, given it will be first in the A* agenda.
But this is where things get interesting and, more importantly, the trade-off between cost and heuristic comes into full effect. Upon looking at Fagaras, it expands the successors and actually finds Bucharest. However, instead of simply calling it a day, as the goal has been found, it realises that the total cost to this point is more expensive than what is estimated from Pitesti. As a result of this priority queue agenda, it expands Pitesti next to then discover a cheaper successor to Pitesti that is in fact Bucharest. As a result, it selects that state next and terminates upon realising it is now in the goal state.
The key thing is here that despite finding the goal, A* backtracked down another route of the search space as that route appeared to be less expensive. It could easily have expanded that route only to discover that it was more expensive in the long run, resulting in it returning back to the first encounter with Bucharest. But in this case, it did find a cheaper route and exploited it.
In this first series of AI 101, we have aimed to examine a number of key elements that are crucial to the basics of artificial intelligence:
- The idea of reward: how maximising against the potential value of a given outcome is useful in making intelligent decisions. We see this both in the use of cost as well as heuristics and the need to balance these two in order to maximise potential in the long run.
- We have discussed how AI is built around the notion of agents, more specifically what we refer to in literature as rational agents: software that will always make the most intelligent decision it can at that particular point in time.
- The idea of states: snapshots of the world that represent all of the information about the world at this point in time we are interested in. This is easily represented in video games courtesy of individual frames of the game that are calculated. In each state, there is the potential for a large number of successor states based upon the range of actions that can be executed by any and all agents in the world. We can calculate the states that can arise through the use of a state-transition function that will calculate the successor states based upon actions executed.
- We have looked at how features of a given game or problem can make things significantly more challenging. This is evident through a lack of complete information, to non-determinism that can arise from the random elements in gameplay.
- With a state transition system in place, we can begin to search for solutions to problems. This can be achieved using uninformed search algorithms such as breadth-first and depth-first search. However, these methods are not particularly well-suited to large and highly-connected search spaces.
- We can adopt the use of intelligent search by using costs of actions as well as the expected (heuristic) value of a state. These can be used individually, but are best used in combination with one another. This is how we can adopt the A* search algorithm.