"Conclusion: Pokellmon"

Introduction

Because the project is not online anymore, here is a short introduction in what PokeLLMon was doing.

Basically, I wanted to show people that LLMs are not capable of performing tasks that actually require thinking. Most people who know me are aware that my relationship with LLMs is somewhat tainted. I believe they stifle creativity and are dangerous tools for beginners in programming. However, I will probably post more about this in the future, because, sadly, it is an ever-growing threat.

I hooked up a self-hosted LLM (LLaVA) to a running copy of Pokemon Emerald via a GameBoy Advance emulator SkyEmu, which is awesome, by the way. I then pulled a frame from the emulator and passed it into the image recognition system of the LLM, asking it to provide the best button press in the current situation. The LLM then responded with a json string that contained the reasoning why it wants to press the button and the button it wants to press. The prompt I provided looked like this:

You are the best Pokemon Emerald player in the world and you want to beat the game as fast as possible. Here is a screenshot of the current game state. Give me the best possible button press to achive the previous stated goal. Also give the reason for the given button press. [... technical stuff to make the LLM return a well formed json ...]"

Then I let it run for a few months.

How did it play out?

It went as well as you might expect. I actually thought it would perform much better, but you can see the final results in the video below. I had to end the project after 3 months because the LLM couldn't do anything interesting. It was not only not entertaining, but also consumed a lot of my server power just to run. In my opinion, this is a really good analogy to the real-world usage of LLMs. The energy wasted for those things is in no way proportional to their usefulness. In comparison, there is a Twitch.tv channel called WinningSequence which does something similar, just much more entertaining because actual stuff is happening over there. But see for yourself. This video is only every 10th frame at 600 FPS and it's still close to 4 minutes long.

Conclusion

It went worse than I expected, and I expected basically nothing. So the 'Expect nothing, still get disappointed.' hits hard on this one. But it was a fun little experiment. Maybe in the future someone finds a non-illigal, non-destructive, sensible use for a LLM. But I don't think this time is near.

Have a good one!