The Mario AI Competition

This article focusses on the ‘Mario AI Competition’ which ran from 2009 to 2012: providing an introduction to the problem area and the challenges it presented. We take time to explore one of the most interesting areas to have arisen in Artificial Intelligence for games in recent years: procedural content generation. The challenges faced in developing AI that creates content that is assessed primarily by the fun it generates for the player. This is a focus of a particular track of the competition.

Finding the Fun

When it comes to either developing or researching AI for games, the idea of ‘fun’ is equally as challenging for for both parties.  When researching AI techniques that can adapt to play existing games, we are typically focussed on optimality: striving to create an agent that can play Go, Chess, Pac-Man, better than anything previously implemented.  However, when we are developing AI to be utilised within a game, with the intent of releasing it as a commercial product, it is one of the few problem domains that exist where optimality is not the sole metric.  We are also interested in whether it is fun to play against.

In commercial games, we are often focussed not on the AI being optimal, but being fun to play.

This can be at odds with how AI researchers think; since what is fun is – in many cases – certainly not optimal.  There is also that perennial problem that plagues not just AI developers, but game designers more-so: how do we define fun?

Now let me stress I am not asking you to conceptualise fun as a metric: some kind of numeric form that allows us to place recorded incidents of ‘fun’ in graphs or tables for some supposed deep analysis.  I don’t believe in that and sadly, if you were to take a wonder around the internet – go on, I’ll wait – you would find some people consider this valid.  Rather, could you at least express what fun is to a point you could discuss it with another human being?  Now this article isn’t an argument for what is fun or what isn’t, because we humans cannot truly agree on it.  Fun as a term is non-specific and subjective: what I find fun may not be your cup of tea and indeed vice versa.

When it comes to game design, making something fun is a challenging task.  Both my colleagues and I stress to students the importance of ‘finding the fun’ in the games they play, but of course what achieves that is never the same thing.  When we tell someone that a game is fun, we are often expressing our enjoyment that comes from a melding of game mechanics that is often more complicated than it looks.

The Challenge of Procedural Content Generation

Many of the issues discussed above are prevalent when devising a means of procedural content generation (PCG).  For those unfamiliar,  PCG is the process of creating some type of content for a game, such as levels, environments and artefacts, through some algorithmic process.

While early video games often relied on procedural generation as a means to circumvent a lack of resource (e.g. disk space and memory), modern games often utilise it as means to create unique content either to reinforce a visual aesthetic or style or as a key component of gameplay.  Arguably one of the most well-known examples is the Borderlands series developed by Gearbox software.  Borderlands utilises a PCG system in order to generate weapons that the player will use.  This can impact not only the type of gun, the damage per second, clip size and many more attributes.  Now how do we know whether that gun is of any use before placing it in the hands of the player?

A sampling of the guns from Borderlands, each of which built using PCG methods.
A sampling of the guns from Borderlands, each of which built using PCG methods.

Quite often the answer is simply ‘we don’t know’.  We cannot determine the effectiveness of procedurally generated content lest it is evaluated within the context of its purpose.  Of course we could make some broad generalisations: not powerful enough, fires too fast for the small clip size, accuracy poor etc.), but that gun may prove useful to another player other than the original recipient.  At the end of the day, when generating weapons the answer is to let the player use the gun.  If they don’t like it?  They’ll get rid of it: drop it, sell it, give it away.  However if they like it, they’ll keep using it.

Of course not all content can be assessed in the same context or as easily as a weapon: given you can determine whether you like a weapon pretty quickly.  As a result, there are some interesting problem areas emerging in AI for us to explore.

The Mario AI CompetitionMysteryBlock

The Mario AI competition ran in many forms between its inception in 2009 until 2012.  The competition has been overseen by Julian
, who at the time of writing is an associate professor at the IT University of Copenhagen’s ‘Centre for Games Research’.  However, full credit must be given to Sergey Karakovskiy,  a Ph.D. student at St. Petersburg State University, who ran many of the competition tracks with Julian. Noor Shaker, a postdoctoral researcher at ITU, was responsible for the track that focusses on procedural generation.  Lastly, Georgios Yannakakis – who will be familiar to those who read our earlier EvoTanks article – was involved in later years of the competition.

A Little Bit of History

As noted earlier, the competition has ran from 2009.  However our area of interest, the procedural generation track, did not run until 2010.  The original emphasis of the competition, much like Ms. Pac-Man before it, was to create characters that could learn to play the game.  The classic Super Mario game is one that is almost ingrained within modern culture.  It provided an alternative to Ms. Pac-Man given that it a completely different style of game, given that it is a two-dimensional platformer.

Outside of the gameplay mechanics, it also acted in much the same way the Ms. Pac-Man had done previously.  Super Mario Bros. is an equally, if not more so, recognisable game than Ms. Pac-Man.  So it acted on two fronts: a legitimate problem to explore and an iconic game to rally interest.  The team pushed the competition heavily through social media in order to get the message out there and encourage submissions not only from researchers but hobbyists too.

The Mario AI competition provides a legitimate AI problem to explore in an iconic game to rally interest.

The open-source Infinite Mario Bros. by Marcus Persson.
The open-source Infinite Mario Bros. by Markus Persson.

The competition itself is reliant upon the Infinite Mario Bros. clone that was developed by Markus ‘Notch’ Persson, who is more famously known for that little project he developed called Minecraft. The game which can be played at the link, is a Java-written clone that successfully mimics many of the original games mechanics.  It is one of the most accurate representations of Mario outside of any software released by Nintendo themselves.  The one thing that separates this from anything that Nintendo produce is that the source code is available.  This has since been re-purposed courtesy of the organisers to suit the needs of the competition.

The Gameplay Track

The gameplay track was the inaugural track of the competition in 2009 and was aimed at researchers adopting any technique they wanted to play Mario levels.  This resulted in a range of implementations (15 to be precise) that varied from hand-scripted to the use of genetic algorithms, neural networks and reinforcement learning.  Given this is not the focus of the article, I would refer readers to (Togelius et al., 2010), which gives an overview of the different solutions implemented.

When the results were in, there was some surprise to note that Robin Baumgarten’s A* implementation trumped all!  The video below is of Robin’s A* Mario in action.  As you can see it is frighteningly efficient and is capable of some moves that make it remarkably ‘unhuman’ in nature.

The Learning Track

As discussed in (Togelius et al., 2010) there was a gulf in the performance of the A* and handcrafted submissions versus those that were reliant upon machine learning algorithms.  As a result a separate track was introduced in 2010.  However, it didn’t seem to be as popular and only ran that year.  I suspect that the A* implementation made people think that the problem was ‘solved’ and did not merit further consideration.

The Turing Track

As you’ll have observed from the video above, the behaviour is not remotely human.  As a result, the team also introduced a Turing track, whereby AI players must behave as human-like as KoopaTroopapossible.  In many ways inspired by a similar competition running in Unreal Tournament 2004: the 2K BotPrize (Hingston, 2010).  The Turing track has only ran in the 2012 iteration of the competition and as noted in (Togelius, 2013a), bots were not capable of fooling the judges.  Interestingly the moments they appeared more human were when they were standing still or second guessing their actions.

The Level Generation Track

The level generation track in the Mario competition is (to our knowledge) the first of its kind.  Unlike the other tracks that were focussed on producing software agents that could act in the players stead as Mario, the focus was to create a level generator which would create levels to play.  As mentioned in (Togelius et al., 2013a), when the first PCG track ran back in 2010, there were no real benchmarks that could be used to argue the validity of a generated piece of terrain.


Of course in order to evaluate the generated content, it needs to be played!  So the traditional method of simply allowing the AI to run against a problem doesn’t work.  So for judging to take place, human players would play against the generated levels.  However, as mentioned in (Shaker et al., 2011), each judge was asked to play a test level that acted solely to accrue metrics from the players performance.  This information was passed on to the level generators to allow them to run their procedural generation.  In addition, there are a number of constraints passed to each generator that dictate some expectations of the generated level.  This included things such as the number of pits, number of coin/power-up blocks and number of Koopa Troopa’s.

These set of constraints were introduced for a reason: it would be naive to assume that these level generators will create something algorithmically without any constraints being imposed upon them. This might seem cynical, but there’s a good chance that an unchallenged level generator is randomly sampling from collections of hand-crafted, human-designed solutions.  That isn’t really the focus of the competition.  In fact, it would be cheating!  This allows the competition focus solely on content crafted procedurally.

NOTE: One of the key things to remember about this competition is that the levels are not being designed to mimic ACTUAL Mario levels. Such a problem is – arguably – even more challenging, given that what we constitute as a Mario level adheres to a rule book that (to my knowledge) has never been released to the public domain and is locked in the minds of Nintendo’s finest designers.  While some generators hope to invoke some of these principles, they are often inferred from experience of playing Mario games. The rules on what makes a level a “Mario level” may not actually exist.

There are a number of different implementations that were developed for the 2010 competition.  For the sake of the article, I want to focus on a handful and show the variety of approaches employed by different teams.  Some are driven by AI algorithms, whereas others have a more prescribed nature.

A ‘Flow’-Driven Generator

The untitled submission by Tomonori Hashiyama (University of Electro-Communications, Tokyo) and Tomoyuki Shimizu (formerly Uni. Electro-Com, Tokyo), developed a system designed to make experiences ‘flow’ between one another.  This is achieved by using what is known as an Interactive Evolutionary Computational algorithm.  In short, an evolutionary algorithm is used to build the solutions, but the fitness criteria are first determined by the players performance in the test level.

An overview of the architecture of the Shimizu and Hashiyama generator. (Shaker et al., 2011)
An overview of the architecture of the Shimizu and Hashiyama generator. (Shaker et al., 2011)

A skill and preference estimator derives key information from players data from the test level.  Identifying your preference for collecting coins, killing enemies and smashing blocks.  The system is then reliant upon a collection of randomly crafted level ‘chunks’ which are assessed with respect to the metrics you have defined.  The system then creates levels by matching chunks to the skill level, followed by the preference of the player.

The ‘Hopper’ Level Generator

Developed by Glen Takahashi, a University of California undergraduate and Gillian Smith, now an assistant professor at Northeastern University in Boston, MA.  This generator used a rule-driven approach to place tiles across the map from left to right, utilising probabilities that were hand-crafted for things such as widths of gaps and enemy frequency.  Depending on the incoming user-stats, different probabilities would be applied.

Screenshots showcasing the special 'zones' that the 'Hopper' generator by Glen Takahashi and Gillian Smith would implement. Clockwise from top-left: hidden coin zone, fire zone, super jump zone and shell zone. (Shaker et al., 2011)
Screenshots showcasing the special ‘zones’ that the ‘Hopper’ generator by Glen Takahashi and Gillian Smith would implement. Clockwise from top-left: hidden coin zone, fire zone, super jump zone and shell zone. (Shaker et al., 2011)

After the bulk of the map (roughly 85%) is built, the system then adds “special zones” which are reminiscent of features one would expect in Mario levels such as hidden coin platforms and large jump segments.

While slightly tangential: Gillian Smith has a fun article/rant on her site dedicated to what she perceives is “The Seven Deadly Sins of PCG Research”. It’s worth a read!


Meanwhile Ben Weber, a PhD student at the University of California, devised the Probabilistic Multi-Pass Generator (ProMP) that adds content to the level in stages.  For each level generated, the system would make multiple passes in a specific order: main terrain, hills, pipes, enemies, blocks then coins. This is shown in an example below.
An overview of the level generation process in Ben Weber’s implementation. Note the multiple passes taken to add new features over time. Taken from Ben’s web post on the submission:

This process is also parameterised, allowing the levels to be tailored to suit players skill levels.  Such as the video below from Weber’s youtube channel, highlighting the levels it would generate to challenge Baumgarten’s A* Mario player.  These levels are pretty intense and would be insurmountable for many human players.  Interestingly, it also identifies some limitations of the A* player.  An article summarising the implementation can be found at Ben’s blog at UCSC.

For more information on all of these implementations, please consult (Shaker et al., 2011) which summarises all of the implementations developed.


While the Mario AI competition has concluded, it is by no means the end of the line for this vein of research.  In fact this couldn’t be farther from the truth: the level generation competition is now running in a new format.  The team responsible for the competition are wary of being reliant upon an art style derived from Mario and the potential legal issues this could raise with Nintendo. As a result the team has since moved away from relying on the well-known plumber for the visual aesthetic and adopted graphics and sounds from the open-source Linux game SuperTux.  However, as discussed in (Togelius et al., 2013a), the game has received more than a fresh lick of paint: many tweaks have been made to the original source as well as changes to the competition model, such that future competitions perform better than before.

The new competition, succinctly titled the “Platformer AI Competition” ran both a Turing Track and Level Generation Track last year.  In a future article – many months from now – we shall revisit this problem to discuss the output of this competition.


Outside of the Mario AI competition, the idea of what Procedural Content generation is and how best to further explore PCG as a research area is argued in (Togelius, et al, 2013b).  This focusses not only on facing challenges raised in the Mario competition, but also discusses areas such as in-game animation, music, quests and story-lines that could be addressed with procedural generation, many of which already are in some fashion. Some of the more unconventional and exciting ideas are focussed not only on procedurally generating entire games, but the challenge of subsequently developing AI that could play these games interchangeably.  This refers to two growing areas of research and development; standardising video games, courtesy of a video game description language (Ebner et al., 2013), as well as general video game playing (Levine et al., 2013).  The latter refers to developing AI that can adapt to any video game it is given.  At present, this is trait reserved for humans, but we can always hope to change that.

Useful Links

Links to both Mario AI and Platformer competitions can be found below.

Note: Mario 2010 competition URLs appear to point to the 2012 site, which may have replaced the original in the domain.

  • Platformer AI Competition


(Ebner, M., Levine, J., Lucas, S.M.,  Schaul, Thompson, T. and Togelius, J., 2013) Towards a Video Game Description Language. Dagstuhl Follow-ups volume 6: Artificial and Computational Intelligence in Games.

(Hingston, P., 2010), A New Design for a Turing Test for Bots. Computational Intelligence and Games (CIG), 2010 IEEE Symposium on, 345-350,

(Levine, J.,  Congdon, C.B., Ebner, M., Kendall, G., Lucas, S.M., Miikkulainen, R., Schaul, T., and Thompson, T., 2013 ) General Video Game Playing. Dagstuhl Follow-ups volume 6: Artificial and Computational Intelligence in Games.

(Shaker, N., Togelius, J., Yannakakis, G.N., Weber, B., Shimizu, T.,  Hashiyama, T., Sorenson, N., Pasquier, P., Mawhorter, P., Takahashi, G., Smith, G. and Baumgarten, R., 2011) The 2010 Mario AI championship: Level generation track. IEEE Transactions on Computational Intelligence and Games

(Togelius, J., Karakovskiy, S. and Baumgarten, R., 2010) The 2009 Mario AI Competition. Proceedings of the IEEE Congress on Evolutionary Computation (CEC).

(Togelius et al., 2013a) – (Togelius, J., Shaker, N., Karakovskiy, S. and Yannakakis, G.N., 2013) The Mario AI Championship 2009-2012. AI Magazine, 34(3), 89-92.

(Togelius et al., 2013b) – (Togelius, J., Champandard, A.J., Lanzi, P.L., Mateas, M., Paiva, A., Preuss, M. and Stanley, K.O. 2013)  Procedural Content Generation: Goals, Challenges and Actionable Steps. Dagstuhl Follow-ups volume 6: Artificial and Computational Intelligence in Games.

Enjoying AI and Games? Please support me on Patreon!
Tommy Thompson Written by:

Tommy is the writer and producer of AI and Games. He's a senior lecturer in computer science and researcher in artificial intelligence with applications in video games. He's also an indie video game developer with Table Flip Games. Because y'know... fella's gotta keep himself busy.