Home Chess OpenAI’s o3 Crushes Grok 4 In Final, Wins Kaggle’s AI Chess Exhibition Tournament

OpenAI’s o3 Crushes Grok 4 In Final, Wins Kaggle’s AI Chess Exhibition Tournament

by

OpenAI’s o3 took the crown after steamrolling over Grok 4 in the final day of the AI chess exhibition match in Google’s Kaggle Game Arena. In the third-place match, Gemini 2.5 Pro defeated o4-mini 3.5-0.5 for the bronze medal.

Kaggle Arena Chess Exhibition Tournament Bracket-Finals


Watch IM Levy Rozman‘s video recap below, or read the full report:


o3 4-0 Grok 4

Up until the Semifinals, it seemed like nothing would be able to stop Grok 4 on its way to winning the event. Despite a few moments of weakness, X’s AI seemed to be by far the strongest chess player. Its lack of comments about most of its moves even felt like an ominous sign that Grok was a precise and deadly beast that would answer to no one.

But the illusion fell through on the last day of the tournament. The chatty o3 simply dismantled its mysterious opponent with four convincing wins. Grok’s play was unrecognizable, blundering soon and often. And for the most part, o3 showed no mercy.

Starting from game one, Grok dropped a bishop for no apparent reason in the opening. A piece down, Grok offered trades—surprising, since all chess literature out there advises against simplifying when you’re down material. A few more blunders, and o3 checkmated its opponent to open the score.

The second game featured the Poisoned Pawn variation of the Sicilian Defense. And if the b2-pawn is poisonous for human players, it seems like the a2-pawn carries a deadly virus for artificial ones. Black played 12…Qxa2??, taking a pawn that was protected by White’s c3-knight. Victory came easy for o3 after that.

For the first time in this event, an AI played the Maroczy structure of the Sicilian. That happened in game three, when Grok had the white pieces. With a comfortable position, it seemed like Grok could be coming back to its old form—was the AI so advanced that it was actually toying with its opponent in the first couple of games?

All hope was lost, though, after White’s 11.Nd5??, dropping the knight. Moments later, Grok dropped the queen, an exchange, a rook, and then the game. 

The last game of the match was the closest one, and for a moment, it seemed like it was o3’s turn to throw the game. With an early queen blunder, o3’s position was much worse. But as commentator GM Hikaru Nakamura pointed out, there were still a few tricks in the position. In contrast to its early horrendous blunder, o3 bounced back and found a nice tactic to win the queen back. 

It all boiled down to an endgame that should still be drawn despite o3 being up a pawn. However, as we saw the previous day, when Grok failed to checkmate with a rook, pawns, and king, X’s AI appears to struggle with endgames. o3 could apparently understand the endgame much better and come up with better moves, eventually queening and checkmating Grok.

This is our Game of the Day, analyzed by GM Rafael Leitao below:

With this victory, o3 became the first winner of the Kaggle Game Arena, while Grok 4 had to settle for the silver medal.

o-mini 0.5-3.5 Gemini 2.5 Pro

The match for third place between Gemini 2.5 Pro and o4-mini was not as lopsided, but it wasn’t a close affair either. With three wins and a draw, Gemini stepped onto the podium for bronze.

Despite the dominant performance, Gemini’s games were a collection of messy afairs, and not even close in quality to the ones played by the event’s winner, o3. At first, it seemed like Gemini had a pretty good idea of what it was doing, even playing a somewhat good attacking game one:

But the draw in game three was much closer to the reality of the rest of the match. Both players had very little idea of what was going on and played, overall, poor chess. In the game in question, the swings in evaluation show how bad players were blundering, and how difficult it was for either to convert an easily winning position:

Gemini 2.5 Pro vs. o4-mini game review eval swings

Below you can see the game in full:

But despite this dubious draw, Gemini’s performance was enough. By winning the other three games, Google’s Gemini 2.5 Pro guaranteed at least a podium finish that could make the event’s organizers proud. It’ll be interesting to see how Google uses their findings from this project to improve their AI.

With this, we wrap up Kaggle’s AI chess exhibition tournament. You can check out Kaggle’s website to access the source code and more information about the event, and draw your own conclusions about AIs and how close they truly are to Artificial General Intelligence

The Kaggle Game Arena AI Chess Exhibition Tournament, which takes place from August 5-7, is an event organized by Google in their new Kaggle Game Arena, where some of the world’s leading Large Language Models (LLMs) compete in a series of chess games. The LLMs compete in a single-elimination bracket. 


Previous coverage:

Source link

You may also like

Leave a Comment