This is a continuation of the series discussing the GVG-AI competition, which was first discussed across two parts earlier in 2014. In part 3, we look back on the first iteration of the competition now that it has completed.
Late August of 2014 was the conclusion of the first General Video Game AI (GVG-AI) Competition. The competition was held at the IEEE Conference on Computational Intelligence and Games (IEEE CIG), which took place in Dortmund, Germany. As discussed in Part 1 and Part 2 of this series, the competition is focussed on creating AI agents that can play any video game that are presented to them. To be frank, this is an immensely challenging problem: an issue which I discussed at length in Part 1.
In addition to the previous parts of the GVG-AI series, a summary of the IEEE CIG 2014 conference as well as a brief discussion of GVG-AI appears in a round-up report.
It has been a couple of months since the completion of the competition and I have let it sit for a while in order to let my thoughts germinate on the competition: its successes, failures and future. As such, I have felt it was important to wait before completing this trilogy of articles on GVG-AI. Not to mention that I also participated in the competition and can compare and contrast my own submission to those of others.
So with all this in mind, let’s talk about the competition results, my own submission and my overall feelings on the event now that is has reached the end of its first main cycle.
An Overview of the Competition
For a competition that was on its first pass, it can largely be considered a success. Any research competition lives largely on the back of two things:
- How many people submit entries to the competition.
- Whether the problem maintains any real interest from a research standpoint.
The second problem can arise too early if the competition was actually not that taxing. Conversely, we may never know if that issue will ever arise if we do not see a sufficient number of entries to the competition. Though I would argue that the first problem is the biggest: if you cannot entice enough people to submit to your competition, then a competition will typically fade away after its first year.
It was pleasing to note that GVG-AI received 19 submissions, which is a reasonable number for a competition that was only advertised for a couple of months prior to the first event. The overall rankings contain 23 entries, though 4 of these were sample code that were created by the competition organisers. This was an interesting addition, given that it puts the work of the human entrants at odds with (simple) sample solutions, showing whether particular practices may prove more fruitful than others. This is of particular importance later in this piece.
The final rankings are listed on the GVG-AI page with the winner of the 2014 competition being Adrien Couetoux of the Academica Sinica in Taiwan. Adrien’s submission completed 256 of the 500 testing games that each contestant was required to complete. Congratulations to Adrien, whose submission had a comfortable lead over the others.
The final competition had a number of unique games that were introduced just for the occasion. The original competition framework required submissions to be tested against a validation set, but the final assessment was against a list of ‘new’ games that were never revealed to the public. In the final presentation, these games were finally revealed. Some of the notable games from that collection are listed below:
- Camel Race: A deceivingly simple game where the focus is to run from one side of the screen to the other.
- Dig Dug: A clone of the old-school 8 bit game.
- Infection: Players must infect nearby objects entities and ‘spread’ their disease across the map.
- Pac-Man: As is to be expected from any AI competition, a clone of Pac-Man was provided in the final test set.
- Seaquest: A hybrid of Missile Command and Space Invaders, with enemies moving under the sea in particular directions and the player needs to take them out. All the while their air resources is draining and needs to be replenished by returning to the surface.
While other games existed, the key element is that largely there are different approaches to interactions with the players avatar and other objects. These typically result in gradual score increase for different purposes.
While these games were announced (and can be seen in the Prezi presentation linked to above), we have still not seen the actual video game description language (VGDL) definitions that were used to craft the games. It was stated at the competition that these may be made public for next years iteration as part of the public test set.
As mentioned earlier, there were 19 submissions to the competition. Once each entry is submitted, it is then given an opportunity to be graded as part of the competition. We first talk about how that grading takes place followed by what types of submissions were received.
The Scoring Process
Each submission is evaluated in each of the 10 games, across 5 levels of each game for 10 separate iterations, i.e. each bot plays a given game 50 times, with a total of 500 games played. The final scoring of each bot is based upon certain factors of decreasing importance:
- The number of victories.
- The total score in a given round.
- The total time required to finish the level.
These three factors help establish a ranking, with points awarded to agents based upon their ranking. An example of this can be seen in the screencapture below. Where based upon the number of matches won and timesteps taken, each agent is given a particular score based upon their overall performance.
It is interesting to note in this particular instance that the aforementioned winning bot (adrienctx) is the third best player in this ranking. However, the trick is to be consistently good overall, which is what adrienctx achieved.
The final rankings are shown below, with adrienctx being a clear winner with a comfortable 10 point gap. The second place submission JinJerry – while 10 points behind – maintains a massive gap between the third place submission which, interestingly, is the sample MCTS bot!
It is rather evident from this that Monte Carlo Tree Search (MCTS) is the dominant approach. As intimated in previous posts, not to mention the recent report on the conference itself, MCTS has had a significant impact upon the AI community and its impact is just as pervasive in games based research.
According to the organisational team, MCTS accounts for at least 6 of the total number of submissions – in some cases the author did not reveal the method which they used. Perhaps more interestingly, the sample MCTS bot provided by the competition organisers performed better than most of these submissions.
Meanwhile, other learning approaches such as Q-Learning and Genetic Algorithms were adopted but were far less successful. On this note, it may be time to discuss my own submission to the competition.
I opted to enter the conference given that it all sounded rather exciting. Well, I’m a little biased to be honest, given I co-authored the papers on both general video game intelligence (Levine et al., 2013) and the video game description language (Ebner et al., 2013). However, the one thing I was interesting in pursuing was how well a ‘simple’ or ‘classical’ approach would perform in this competition in comparison to the rest. As a result, my final submissions relies solely on three simple components:
- A simple random move system: which makes a random move provided it will not kill the player.
- A steepest-ascent hill climber: which continually takes the best action based on looking one layer into the search space.
- A* search: the vanilla of informed search algorithms, which is still very powerful. This slightly modified version would only plan for a pre-determined number of frames ahead of itself.
The reasons for this decision are numerous, though I lay the blame on two particular points:
- I knew straight away that a large number of people would adopt MCTS as their means to solve the problem. This open-ended problem specification – complete with undefined problem domains – looked indicative of the sort of problem where MCTS can thrive. In fact, I actually started working on my own MCTS bot that was based upon the UCT variant of the algorithm(Kocsis & Szepesvári, 2006).
- Looking back at previous competitions, notably the Mario AI Competition (Togelius et al., 2009), it’s always interesting to see how well a simple informed search approach compares to other methods more interested in learning or adapting to the problem domain. Of course I was not expecting performance akin to Robin Baumgarten’s infamous Mario AI submission given the nature of the competition. However, I was keen to see how well it would stack up.
With this in mind, my approach is reliant upon either the hillclimber or A* depending on the problem it tackles and it oscillates between them depending on certain conditions arising the performance of the bot. One of the biggest problems is identifying what the problem is that it is trying to solve. In each case they are reliant upon some sort of cost metrics and heuristics. These allow us to establish whether any potential state is more viable than current ones.
To achieve this, I crafted a handful of specific types of heuristics based upon the potential kinds of problems it was going to discover in the game world. These largely fell onto specific problems such as:
- Reaching goal doors or portals.
- Destroying enemy characters using projectiles.
- Colliding with objects (either to kill or collect).
These could be layered on top of a simple Win/Lose heuristic to calculate each states value. In preliminary testing, this approach worked remarkably effective in some games (Space Invaders and Frogger) and appalling in others (Sokoban and Zelda). It is undoubtedly the biggest limitation that any heuristic-driven approach would when compared to the MCTS method. However, I am pleased to say it performed slightly better than expected.
As can be seen from the table posted above, my submission (T2Thompson) finished 10th in the final rankings, which I felt was rather impressive for such a simple approach. Not only did it (thankfully) outperform the other sample scripts, but also defeated several other human written submissions that were reliant upon more intelligent methods.
The Past, Present and Future of GVG-AI
As mentioned at the beginning of this piece, I felt it best to let the competition sit in my mind for a while – there was of course the reality of the new teaching semester starting, but let’s gloss over that. The GVG-AI competition had a rather successful first iteration and it was encouraging to see not only a decent number of participants, but more importantly, that we have gained a decent foothold in this problem domain.
While I didn’t think it would be an easy problem to solve, I didn’t doubt that as a community we could perform well in these problem areas. Rather, the interesting thing was to see just how much could be achieved in this period of time. What has prove interesting is how quickly the community at-large (myself included) felt it best to tackle this problem with Monte Carlo Tree Search algorithms. This is perhaps unsurprising given we already know just how effective this approach can be across a variety of domains. However, it was rather fascinating to see such a large gap in terms of the overall performance. While it may come down to the overall performance of this persons implementation, consideration could be given to whether the type of the MCTS algorithm had an impact. This can be focussed on a number of unique properties in the algorithm; parameters set to manage the search process could very well prove key here. Or in fact it may be that one MCTS version adopted certain techniques the others did not.
While there are winners and losers, there is still much to be done with this problem domain if we are to say that this challenge has been addressed. I have it on good authority that it will continue in 2015 and no doubt will include new and more challenging games. What is perhaps exciting about this is that the longer the competition goes on, more games can be introduced using the VGDL and engine that present new challenges to the researchers involved. I look forward to seeing what comes of it in the future!
A sincere thank you to Diego Perez, who was responsible for the co-ordination and management of the GVG-AI competition. Diego was kind enough to provide me with the videos of gameplay footage used in the conference presentation that can be seen in the final video shown above.
- GVG-AI Part 1 – In which I discussed what the competition is and its relevance from a research perspective.
- GVG-AI Part 2 – In which I deconstructed the public API that was released to the public for submitting a controller to the competition.
- Kocsis, L., & Szepesvári, C. (2006). “Bandit based monte-carlo planning.” In Machine Learning: ECML 2006 (pp. 282-293). Springer Berlin Heidelberg.
- Togelius, J., Karakovskiy, S. and Baumgarten, R., (2010) The 2009 Mario AI Competition. Proceedings of the IEEE Congress on Evolutionary Computation (CEC).
- Levine, J., Congdon, C.B., Bida, M., Ebner, M., Kendall, G., Lucas, S.M., Miikkulainen, R., Schaul, T. and Thompson, T., (2013). “General Video Game Playing.” Dagstuhl Follow-ups, 6(1)
- Ebner, M., Levine, J., Lucas, S.M., Schaul, T., Thompson, T. and Togelius, J,. (2013) “Towards a Video Game Description Language.” Artificial and Computational Intelligence in Games. Dagstuhl Follow-ups, 6(1) . Dagstuhl Publishing, pp. 85-100