Show HN: TetrisBench – Gemini Flash reaches 66% win rate on Tetris against Opus
Posted by ykhli 2 days ago
Comments
Comment by bubblesorting 2 days ago
Some feedback: - Knowing the scoring system is helpful when going 1v1 high score
- Use a different randomization system, I kept getting starved for pieces like I. True random is fine, throwing a copy of every piece into a bag and then drawing them one by one is better (7 bag), nearly random with some lookbehind to prevent getting a string of ZSZS is solid, too (TGM randomizer)
- Piece rotation feels left-biased, and keeps making me mis-drop, like the T pieces shift to the left if you spin 4 times. Check out https://tetris.wiki/images/thumb/3/3d/SRS-pieces.png/300px-S... or https://tetris.wiki/images/b/b5/Tgm_basic_ars_description.pn... for examples of how other games are doing it.
- Clockwise and counter-clockwise rotation is important for human players, we can only hit so many keys per second
- re-mappable keys are also appreciated
Nice work, I'm going to keep watching.
Comment by vunderba 2 days ago
Comment by qsort 2 days ago
I don't think the goal is to make a PvP simulator, it would be too easy to cheese or do weird strategies. It's mostly for LLMs to play.
Comment by bubblesorting 2 days ago
On the topic of reflexes decaying (I'm getting there, in my late 30s): Have you played Stackflow? It's a number go up roguelite disguised as an arcade brick stacking game, but the gravity is low enough that it is effectively turn based. More about 'deck' building, less about chaining PCs and C-Spins.
Comment by kinduff 2 days ago
By the way, kudos on your feedback. If I was OP, I would've been honored to get that type of fine-tuning comments.
Comment by Keyframe 2 days ago
One of my dream goals was to make a licensed low lag competitive game kind of like TGM, but I heard licensing is extremely cost-prohibitive so I kind of gave up on that goal. I remember I said to someone I was ready to pony up few tens of thousands for a license + cut, but reportedly it starts at an order of the magnitude higher.
Comment by ykhli 2 days ago
- Each model starts with an initial optimization function for evaluating Tetris moves.
- As the game progresses, the model sees the current board state and updates its algorithm—adapting its strategy based on how the game is evolving.
- The model continuously refines its optimizer. It decides when it needs to re-evaluate and when it should implement the next optimization function
- The model generates updated code, executes it to score all placements, and picks the best move.
- The reason I reframed this problem to a coding problem is Tetris is an optimization game in nature. At first I did try asking LLMs where to place each piece at every turn but models are just terrible at visual reasoning. What LLMs great at though is coding.
Comment by dakom 2 days ago
Comment by bityard 2 days ago
Also, if the creator is reading this, you should know that Tetris Holdings is extremely aggressive with their trademark enforcement.
Comment by OGEnthusiast 2 days ago
Comment by vunderba 2 days ago
Comment by storystarling 2 days ago
Comment by ykhli 2 days ago
Comment by storystarling 2 days ago
Comment by vunderba 2 days ago
Comment by ykhli 2 days ago
Comment by mhh__ 2 days ago
Comment by burkaman 2 days ago
Comment by augusteo 2 days ago
Curious what the latency looks like per move. That seems like the actual bottleneck here.
Comment by p0w3n3d 2 days ago
Comment by brookman64k 2 days ago
Comment by p0w3n3d 2 days ago
A youtuber sentdex has a whole series on parsing the game's image and playing GTA https://www.youtube.com/playlist?list=PLQVvvaa0QuDeETZEOy4Vd...
Comment by gpm 2 days ago
Comment by esafak 2 days ago
Comment by bogtog 2 days ago
.....
l....
l....
l.ttt
l..t.
Comment by akomtu 2 days ago
Comment by vunderba 2 days ago
Comment by gpm 2 days ago
It will lose so badly there will be no point in the comparison.
Besides you could compare models (and harnesses) directly against eachother.
Comment by akomtu 2 days ago
Comment by mikkupikku 2 days ago
Maybe that's the result you want for some sort of rhetorical reason, but it would nonetheless not be an informative test.
Comment by ykhli 2 days ago
Comment by arendtio 2 days ago
I mean, if you let the LLM build a testris bot, it would be 1000x better than what the LLMs are doing. So yes, it is fun to win against an AI, but to be fair against such processing power, you should not be able to win. It is only possible because LLMs are not built for such tasks.
Comment by westurner 2 days ago
Task: write and optimize a tetris bot
Task: write and safely online optimize a tetris bot with consideration for cost to converge
openai/baselines (7 years ago) was leading on RL and then AlphaZero and Self-Attention Transformer networks.
LLMs are trained with RL, but aren't general purpose game theoretic RL agents?
Comment by westurner 1 day ago
"Outsmarting algorithms: A comparative battle between Reinforcement Learning and heuristics in Atari Tetris" (2025) https://dl.acm.org/doi/10.1016/j.eswa.2025.127251
Comment by i_cannot_hack 2 days ago
Comment by tiahura 2 days ago
Comment by segmondy 2 days ago
Comment by indigodaddy 2 days ago
Comment by ykhli 2 days ago
That difference in objective bias shows up very clearly in Tetris, but is much harder to notice in typical coding benchmarks. Just a theory though based on reviewing game results and logs
Comment by purplecats 2 days ago