GPT Image 1.5
Posted by charlierguo 1 day ago
Comments
Comment by vunderba 1 day ago
https://genai-showdown.specr.net/image-editing
Conclusions
- OpenAI has always had some of the strongest prompt understanding alongside the weakest image fidelity. This update goes some way towards addressing this weakness.
- It's leagues better at making localized edits without altering the entire image's aesthetic than gpt-image-1, doubling the previous score from 4/12 to 8/12 and the only model that legitimately passed the Giraffe prompt.
- It's one of the most steerable models with a 90% compliance rate
Updates to GenAI Showdown
- Added outtakes sections to each model's detailed report in the Text-to-Image category, showcasing notable failures and unexpected behaviors.
- New models have been added including REVE and Flux.2 Dev (a new locally hostable model).
- Finally got around to implementing a weighted scoring mechanism which considers pass/fail, quality, and compliance for a more holistic model evaluation (click pass/fail icon to toggle between scoring methods).
If you just want to compare gpt-image-1, gpt-image-1.5, and NB Pro at the same time:
https://genai-showdown.specr.net/image-editing?models=o4,nbp...
Comment by quietbritishjim 20 hours ago
Ludicrously unnecessary nitpick for "Remove all the brown pieces of candy from the glass bowl":
> Gemini 2.5 Flash - 18 attempts - No matter what we tried, Gemini 2.5 Flash always seemed to just generate an entirely new assortment of candies rather than just removing the brown ones.
The way I read the prompt, it demands that the candies should change arrangement. You didn't say "change the brown candies to a different color", you said "remove them". You can infer from the few brown ones that you can see that there are even more underneath - surely if you removed them all (even just by magically disappearing them) then the others would tumble down into a new location? The level of the candies is lower than before you started, which is what you'd expect if you remove some. Maybe it's just coincidence, but maybe this really was its reasoning. (It did unnecessarily remove the red candy from the hand though.)
I don't think any of the "passes" did as well as this, including Gemini 3.0 Pro Image. Qwen-Image-Edit did at least literally remove one of the three visible brown candies, but just recolored the other two.
Comment by vunderba 14 hours ago
You will note that the Minimum Passing Criteria allows for a color change in order to pass the prompt but with the rapid improvements in generative models, I may revise this test to be stricter, only allowing "Removal" to be considered as pass as opposed to a simple color swap.
Comment by pierrec 1 day ago
Maybe everyone has a different dose of skepticism. Personally I'm not even looking at results for models that were released after the benchmark, for all this tells us, they might as well be one-trick ponies that only do well in the benchmark.
It might be too much work, but one possible "correct" approach for this kind of benchmark would to periodically release new benchmarks with new tests (that are broadly in the same categories) and only include models that predate each benchmark.
Comment by vunderba 1 day ago
I don't have any captcha systems in place, but I wonder if it might be worth putting up at least a few nominal roadblocks (such as Anubis [1]) to at least slow down the scrapers.
A few weeks ago I actually added some new, more challenging tests to the GenAI Text-to-Image section of the site (the “angelic forge” and “overcrowded flat earth”) just to keep pace with the latest SOTA models.
In the next few weeks, I’ll be adding some new benchmarks to the Image Editing section as well~~
Comment by echelon 1 day ago
Generate a novel previz scene programatically in Blender or some 3D engine, then task the image model with rendering it in a style (or to style transfer to a given image, eg. something novel and unseen from Midjourney). Another test would be to replace stand in mannequins with identities of characters in reference images and make sure the poses and set blocking match.
Throw in a 250 object asset pack and some skeletal meshes that can conform to novel poses, and you've got a fairly robust test framework.
Furthermore, anything that succeeds from the previz rendering task can then be fed into another company's model and given a normal editing task, making it doubly useful for two entirely separate benchmarks. That is, successful previz generations can be reused as image edit test cases - and you a priori know the subject matter without needing to label a bunch of images or run a VLM, so you can create a large set of unseen tests.
[1] https://imgur.com/gallery/previz-to-image-gpt-image-1-x8t1ij...
Comment by somenameforme 1 day ago
So I don't think there's even a question of whether or not newer models are going to be maximizing for benchmarks - they 100% are. The skepticism would be in how it's done. If something's not being run locally, then there's an endless array of ways to cheat - like dynamically loading certain LoRAs in response to certain queries, with some LoRAs trained precisely to maximize benchmark performance. Basically taking a page out of the car company playbook in response to emissions testing.
But I think maximizing the general model itself to perform well on benchmarks isn't really unethical or cheating at all. All you're really doing there is 'outsourcing' part of your quality control tests. But it simultaneously greatly devalues any benchmark, because that benchmark is now the goal.
Comment by smusamashah 1 day ago
Comment by 8n4vidtmkvmk 1 day ago
And if that still doesn't get you there, hash the image inputs to detect if its one of these test photos and then run your special test-passer algo.
Comment by KeplerBoy 22 hours ago
What a prompt and image.
Comment by __alexs 20 hours ago
Comment by nisegami 18 hours ago
Comment by walrus01 22 hours ago
Comment by imdsm 21 hours ago
Comment by smusamashah 1 day ago
Comment by vunderba 1 day ago
Comment by nicpottier 13 hours ago
Out of curiosity why does gemini get gold for the poker example but gpt-image 1.5 does not? I couldn't see a difference between the two.
Comment by Bombthecat 9 hours ago
Comment by vunderba 41 minutes ago
Comment by boredhedgehog 21 hours ago
Comment by vunderba 14 hours ago
If you look at the ones that passed (Flux.2 Pro, Gemini 2.5 Flash, Reve), you'll see that they did not add/subtract/move any of the pockmarks from the original image.
Comment by leumon 18 hours ago
Comment by singhkays 1 day ago
I edited the original "Lord of War" poster with a reference image of Jensen and replaced bullets with GPU dies, silicon wafers and electronic components.
Comment by llmthrow0827 1 day ago
Comment by heystefan 1 day ago
Comment by vunderba 1 day ago
In addition to giving models multiple attempts to generate an image, we also write several variations of each prompt. This helps prevent models from getting stuck on particular keywords or phrases, which can happen depending on their training data. For example, while “hippity hop” is a relatively common name for the ball-riding toy, it’s also known as a “space hopper.” In some cases, we may even elaborate and provide the model with a dictionary-style definition of more esoteric terms.
This is why providing an “X Attempts” metric is so important. It serves as a rough measure of how “steerable” a given model is - or put another way how much we had to fight with the model in order for it to consistently follow the prompt’s directives.
Comment by mvkel 1 day ago
Comment by lobochrome 1 day ago
Comment by BoredPositron 1 day ago
Comment by echelon 1 day ago
Personal request: could you also advocate for "image previz rendering", which I feel is an extremely compelling use case for these companies to develop. Basically any 2d/3d compositor that allows you to visually block out a scene, then rely on the model to precisely position the set, set pieces, and character poses.
If we got this task onto benchmarks, the companies would absolutely start training their models to perform well at it.
Here are some examples:
gpt-image-1 absolutely excels at this, though you don't have much control over the style and aesthetic:
https://imgur.com/gallery/previz-to-image-gpt-image-1-x8t1ij...
Nano Banana (Pro) fails at this task:
https://imgur.com/a/previz-to-image-nano-banana-pro-Q2B8psd
Flux Kontext, Qwen, etc. have mixed results.
I'm going to re-run these under gpt-image-1.5 and report back.
Edit:
gpt-image-1.5 :
https://imgur.com/a/previz-to-image-gpt-image-1-5-3fq042U
And just as I finish this, Imgur deletes my original gpt-image-1 post.
Old link (broken): https://imgur.com/a/previz-to-image-gpt-image-1-Jq5M2Mh
Hopefully imgur doesn't break these. I'll have to start blogging and keep these somewhere I control.
Comment by vunderba 1 day ago
With additions like structured prompts (introduced in BFL Flux 2), maybe we'll see something like this in the near future.
Comment by irishcoffee 1 day ago
10 years ago I would have considered that sentence satire. Now it allegedly means something.
Somehow it feels like we’re moving backwards.
Comment by echelon 1 day ago
I don't understand why everyone isn't in awe of this. This is legitimately magical technology.
We've had 60+ years of being able to express our ideas with keyboards. Steve Jobs' "bicycle of the mind". But in all this time we've had a really tough time of visually expressing ourselves. Only highly trained people can use Blender, Photoshop, Illustrator, etc. whereas almost everyone on earth can use a keyboard.
Now we're turning the tide and letting everyone visually articulate themselves. This genuinely feels like computing all over again for the first time. I'm so unbelievably happy. And it only gets better from here.
Every human should have the ability to visually articulate themselves. And it's finally happening. This is a major win for the world.
I'm not the biggest fan of LLMs, but image and video models are a creator's dream come true.
In the near future, the exact visions in our head will be shareable. We'll be able to iterate on concepts visually, collaboratively. And that's going to be magical.
We're going to look back at pre-AI times as primitive. How did people ever express themselves?
Comment by concats 22 hours ago
1. Anything that is in the world when you’re born is normal and ordinary and is just a natural part of the way the world works.
2. Anything that's invented between when you’re fifteen and thirty-five is new and exciting and revolutionary and you can probably get a career in it.
3. Anything invented after you're thirty-five is against the natural order of things.”
― Douglas Adams
Comment by vintermann 19 hours ago
* I'm into genealogy. Naturally, most of my fellow genealogists are retired, often many years ago, though probably also above average in mental acuity and tech-savviness for their age. They LOVE generative AI.
* My nieces, and my cousin's kids of the same age, are deeply into visual art. Especially animation, and cutesy Pokemon-like stuff. They take it very seriously. They absolutely DON'T like AI art.
Comment by Rodeoclash 1 day ago
Comment by scrollaway 1 day ago
The internet is an amazing technology, yet its biggest consumption is a mix of ads, porn and brain rot.
We all have cameras in our pockets yet most people use them for selfies.
But if you look closely enough, the incredible value that comes from these examples more than makes up for all the people using them in a “boring” way.
And anyway who’s the arbiter of boring?
Comment by irishcoffee 1 day ago
It’s just a tool. It’s not a world-changing tech. It’s a tool.
Comment by SchemaLoad 1 day ago
Comment by minimaxir 1 day ago
One curious case demoed here in the docs is the grid use case. Nano Banana Pro can also generate grids, but for NBP grid adherence to the prompt collapses after going higher than 4x4 (there's only a finite amount of output tokens to correspond to each subimage), so I'm curious that OpenAI started with a 6x6 case albeit the test prompt is not that nuanced.
Comment by vunderba 1 day ago
https://mordenstar.com/blog/edits-with-nanobanana
In particular, NB Pro successfully assembled a jigsaw puzzle it had never seen before, generated semi-accurate 3D topographical extrapolations, and even swapped a window out for a mirror.
Comment by jngiam1 1 day ago
Comment by IgorPartola 1 day ago
Comment by niklassheth 1 day ago
Comment by qingcharles 1 day ago
Also: SUPER ANNOYING. It seems every time you give it a modification prompt it erases the whole conversation leading up to the new pic? Like.. all the old edits vanish??
I added "shaky amateur badly composed crappy smartphone photo of ____" to the start of my prompts to make them look more natural.
Counterpoint from someone on the Musk site: https://x.com/flowersslop/status/2001007971292332520
Comment by vunderba 1 day ago
Comparing NB Pro, GPT Image 1, and GPT Image 1.5
Comment by abadar 1 day ago
Comment by echelon 1 day ago
One thing that gpt-image-1 does exceptionally well that Nano Banana (Pro) can't is previz-to-render. This is actually an incredibly useful capability.
The Nano Banana models take the low-fidelity previz elements/stand-ins and unfortunately keep the elements in place without attempting to "upscale" them. The model tries to preserve every mistake and detail verbatim.
Gpt-image-1, on the other hand, understands the layout and blocking of the scene, the pose of human characters, and will literally repair and upscale everything.
Here's a few examples:
- 3D + Posing + Blocking: https://youtu.be/QYVgNNJP6Vc
- Again, but with more set re-use: https://youtu.be/QMyueowqfhg
- Gaussian splats: https://youtu.be/iD999naQq9A
- Gaussians again: https://youtu.be/IxmjzRm1xHI
We need models that can do what gpt-image-1 does above, but that have higher quality, better stylistic control, faster speed, and that can take style references (eg. glossy Midjourney images).
Nano Banana team: please grow these capabilities.
Adobe is testing and building some really cool capabilities:
- Relighting scenes: https://youtu.be/YqAAFX1XXY8?si=DG6ODYZXInb0Ckvc&t=211
- Image -> 3D editing: https://youtu.be/BLxFn_BFB5c?si=GJg12gU5gFU9ZpVc&t=185 (payoff is at 3:54)
- Image -> Gaussian -> Gaussian editing: https://youtu.be/z3lHAahgpRk?si=XwSouqEJUFhC44TP&t=285
- 3D -> image with semantic tags: https://youtu.be/z275i_6jDPc?si=2HaatjXOEk3lHeW-&t=443
I'm trying to build the exact same things that they are, except as open source / source available local desktop tools that we can own. Gives me an outlet to write Rust, too.
Comment by pablonaj 1 day ago
Comment by echelon 1 day ago
gpt-image-1: https://imgur.com/gallery/previz-to-image-gpt-image-1-x8t1ij... (fixed link - imgur deleted the last post for some reason)
gpt-image-1.5: https://imgur.com/a/previz-to-image-gpt-image-1-5-3fq042U
nano banana / pro: https://imgur.com/a/previz-to-image-nano-banana-pro-Q2B8psd
gpt-image-1 excels in these cases, despite being stylistically monotone.
I hope that Google, OpenAI, and the various Chinese teams lean in on this visual editing and blocking use case. It's much better than text prompting for a lot of workflows, especially if you need to move the camera and maintain a consistent scene.
While some image editing will be in the form of "remove the object"-style prompts, a lot will be molding images like clay. Grabbing arms and legs and moving them into new poses. Picking up objects and replacing them. Rotating scenes around.
When this gets fast, it's going to be magical. We're already getting close.
Comment by oxag3n 1 day ago
Question: with copyright and authorship dead wrt AI, how do I make (at least) new content protected?
Anecdotal: I had a hobby of doing photos in quite rare style and lived in a place where you'd get quite a few pictures of. When I asked gpt to generate a picture of that are in that style, it returned highly modified, but recognizable copy of a photo I've published years ago.
Comment by mortenjorck 1 day ago
Air gap. If you don’t want content to be used without your permission, it never leaves your computer. This is the only protection that works.
If you want others to see your content, however, you have to accept some degree of trade off with it being misappropriated. Blatant cases can be addressed the same as they always were, but a model overfitting to your original work poses an interesting question for which I’m not aware of any legal precedents having been set yet.
Comment by echelon 1 day ago
Big IP holders will go nuclear on IP licensing to an extent we've never seen before.
Right now, there are thousands of images and videos of Star Wars, Pokemon, Superman, Sonic, etc. being posted across social media. All it takes is for the biggest IP conglomerates to turn into linear tv and sports networks of the past and treat social media like cable.
Disney: "Gee {Google,Meta,Reddit,TikTok}, we see you have a lot of Star Wars and Marvel content. We think that's a violation of our rights. If you want your users to continue to be able to post our media, you need to pay us $5B/yr."
I would not be surprised if this happens now that every user on the internet can soon create high-fidelity content.
This could be a new $20-30B/yr business for Disney. Nintendo, WBD, and lots of other giant IP holders could easily follow suit.
Comment by empressplay 1 day ago
https://arstechnica.com/ai/2025/12/disney-invests-1-billion-...
Comment by echelon 1 day ago
https://www.engadget.com/ai/google-pulls-ai-generated-videos...
The next step is to take this beyond AI generations and to license rights to characters and IP on social media directly.
The next salvo will be where YouTube has to take down all major IP-related content if they don't pay a licensing fee. Regardless of how it was created. Movie reviews, fan animations, video game let's plays.
I've got a strong feeling that day is coming soon.
Comment by margorczynski 1 day ago
Comment by rafram 1 day ago
Comment by realharo 19 hours ago
Comment by panopticon 10 hours ago
Comment by LudwigNagasena 1 day ago
Comment by pfortuny 22 hours ago
There seems to be no other way (apart from air-gapping everything, as others say).
Comment by ur-whale 1 day ago
Question: Now that the steamboats have been invented, how do I keep my clipper business afloat ?
Answer: Good riddance to the broken idea of IP, Schumpeter's Gale is around the corner, time for a new business model.
Comment by 999900000999 1 day ago
Back in reality, you can get in line to sue. Since they have more money than you, you can't really win though.
So it goes.
Comment by nobody_r_knows 1 day ago
Comment by swatcoder 1 day ago
If someone were on vacation and came home to learn that their neighbor had allowed some friends stay in the empty house, we would often expect some kind of outrage regardless of whether there had been specific damage or wear to the home.
Culturally, people have deeply set ideas about what's theirs, and feel like they deserve some say over how their things are used and by whom. Even those that are very generous and want their things be widely shared usually want to have have some voice in making that come to be.
Comment by visarga 1 day ago
Comment by oxag3n 1 day ago
Comment by netule 1 day ago
Comment by ragequittah 1 day ago
Comment by netule 1 day ago
Comment by BoorishBears 1 day ago
(to clarify, OpenAI stops refining the image if a classifier detects your image as potentially violating certain copyrights. Although the gulf in resolution is not caused by that.)
Comment by CamperBob2 1 day ago
Comment by whywhywhywhy 8 hours ago
Comment by illwrks 1 day ago
Comment by Forgeties79 1 day ago
How do you feel about entities taking your face off of your personal website and plastering it on billboards smiling happily next to their product? What if it’s for a gun? Or condoms? Or a candidate for a party you don’t support? Pick your own example if none of those bother you. I’m sure there are things you do not want to be associated with/don’t want to contribute to.
At the end of the day it’s very gross when we are exploited without our knowledge or permission so rich groups can get richer. I don’t care if my visual work is only partially contributing to some mashed up final image. I don’t want to be a part of it.
Comment by vintermann 18 hours ago
That would be misrepresentation. Even Stallman isn't OK with that. You can take one of his opinion pieces and publish it as your own. Or you can attach his name to it.
However, if you're editing it and releasing it under his name, clearly you're simply lying, and nobody is OK with that. People have the right to be recognized as authors of things they did author (if they so desire) and they have a right to NOT be associated with things they didn't.
> At the end of the day it’s very gross when we are exploited without our knowledge or permission so rich groups can get richer.
The second part is the entirety of the problem. If I'm "exploited" in a way where I can't even notice it, and I'm not worse off for it, how is it even exploitation? But people amassing great power is a problem no matter if they do it with "legitimate" means or not.
Comment by Forgeties79 18 hours ago
Stallman has his opinions on software, I have my opinions on my visual work. I don’t get really how that applies here or why that settles this matter.
Comment by vintermann 18 hours ago
That's such a bad straw man I wonder if you're really supporting the position you claim to be supporting. Maybe you're just trying to give it a bad name.
Your opinion isn't on visual work, but visual property. You don't demand to be paid for your work - your labor. Rather you traded that for the dream of being paid rent on a capital object, in perpetuity (or close enough). Artists lost to the power-mongers when we bit at that bait.
Comment by Forgeties79 14 hours ago
I don’t really know where all the hostility came from in this conversation but I think it’s best if we move on.
Comment by CamperBob2 1 day ago
Apart from the 'newspaper' anachronism, that's pretty much still my take.
Sorry, but you'll just have to deal with it and get over it.
Comment by Forgeties79 1 day ago
You were fine until this bit.
Comment by onraglanroad 1 day ago
You got to play the copyright game when the big corps were on your side.
Now they're on the other side. Deal with it and get over it.
Comment by Forgeties79 18 hours ago
Comment by CamperBob2 14 hours ago
Meanwhile, the next generation of great artists is already at work down the street from you. Some kids you've never heard of, playing around in a basement or garage you've probably driven past a hundred times. They're learning to make the most of the tools at hand, just like the old masters did. Except the tools at hand this time are little short of godlike.
It's an exciting time. If you wanted things to stay the same, you shouldn't have gone into technology or art.
Comment by Forgeties79 8 hours ago
If you want me to hand some of my work over to artists so they can learn and grow and experiment, send them my way. Happy to help.
Comment by CamperBob2 7 hours ago
Agreed there, which is why it's important to work for open access to the results. The resulting regime won't look much like present-day copyright law, but if we do it right, it will be better for us all.
In other words, instead of insisting that "No one can have this," or "Only a few can have this," which (again) will not be options for works that you release commercially, it's better IMHO to insist that "Everyone can have this."
Comment by smileson2 1 day ago
Comment by Forgeties79 14 hours ago
Comment by jibal 1 day ago
[I won't bother responding to the rest of your appalling comment]
Comment by huflungdung 1 day ago
Comment by huflungdung 1 day ago
Comment by agentifysh 1 day ago
Noticed it captured a megaman legends vibe ....
https://x.com/AgentifySH/status/2001037332770615302
and here it generated a texture map from a 3d character
https://x.com/AgentifySH/status/2001038516067672390/photo/1
however im not sure if these are true uv maps that is accurate as i dont have the 3d models itself
but ive tried this in nano banana when it first came out and it couldn't do it
Comment by gs17 1 day ago
I can tell you with 100% certainty they are not. For example, Crash doesn't have a backside for his torso. You could definitely make a model that uses these as textures, but you'd really have to force it and a lot of it would be stretched or look weird. If you want to go this approach, it would make a lot more sense to make a model, unwrap it, and use the wireframe UV map as input.
Here's the original Crash model: https://models.spriters-resource.com/pc_computer/crashbandic... , its actual texture is nothing like the generated one, because the real one was designed for efficiency.
Comment by Nition 1 day ago
Most of Crash in the first game was not textured; just vertex colours. Only the fur on his back and his shoelaces were textures at all.
Comment by gs17 1 day ago
Comment by agentifysh 1 day ago
tried your suggested approach by unwrapaped wireframe uv as input and im impressed
https://x.com/AgentifySH/status/2001057153235222867
obviously its not going to be accurate 1:1 but with more 3d spatial awareness i think it could definitely improve
Comment by 101008 1 day ago
also in the tweet
> GPT Image 1.5 is **ing crazy
and
> holy shit lol
what's impressive if you don't know if it's right or not (as the other comment pointed out, it is not right)
Comment by blurbleblurble 1 day ago
Comment by kingstnap 1 day ago
My own main use cases are entirely textual: Programming, Wiki, and Mathematics.
I almost never use image generation for anything. However its objectively extremely popular.
This has strong parallels for me to when snapchat filters became super popular. I know lots of people loved editing and filtering pictures but I always left everything as auto mode, in fact I'd turn off a lot of the default beauty filters. It just never appealed to me.
Comment by nurettin 1 day ago
Comment by impjohn 22 hours ago
Comment by 999900000999 1 day ago
In late stage capitalism you pay for fake photos with someone. You have chat gpt write about how you dated for a summer, and have it end with them leaving for grad school to explain why you aren't together.
Eventually we'll all just pay to live in the matrix. When your credit card is declined you'll be logged out, to awaken in a shared studio apartment. To eat your rations.
Comment by ares623 1 day ago
But after a point it'll hit saturation point. The novelty will wear off since everyone has access to it. Who cares if you have a fake photo with a celebrity if everyone knows it's fake.
Comment by sharkjacobs 1 day ago
Comment by minimaxir 1 day ago
At the least, it's not present in these new images.
Comment by swyx 1 day ago
Comment by minimaxir 14 hours ago
Comment by ineedasername 19 hours ago
SWE: "Seriously? import PIL \ read file \ == (c + 10%, m = m, y = y, k = k) \ save file done!"
Exec: "Yeah, and first blogger get's a hold of image #1 they generate, starts saying 'Hey! This thing's been color corrected w/o AI! lol lame'"
Or not, no idea. i've not understood the choice either, besides very intelligent AI-driven auto-touch up for lighting/color correction has been a thing for a while. It's just, for those I end up finding an answer for, maybe 25% of head scratcher decisions do end of having a reasonable, if non intuitive answer for. Here? haven't been able to figure one yet though, or find a reason/mention by someone who appears to have an inside line on it.
Comment by BoorishBears 1 day ago
(although I get what you mean, not easily since you already trained)
I'm guessing when they get a clean slate we'll have Image 2 instead of 1.5. In LMArena it was immediately apparent it was an OpenAI model based on visuals.
Comment by KaiserPro 1 day ago
They forgot to calibrate the cameras, so everything had a green tint.
Meanwhile all the other teams had a billion macbeth charts lying around just in case.
Comment by jiggawatts 1 day ago
Comment by ACCount37 1 day ago
And I say "subtle" - but because that model would always "regenerate" an image when editing, it would introduce more and more of this yellow tint with each tweak or edit. Which has a way of making a "subtle" bias anything but.
Comment by amoursy 1 day ago
Comment by danielbln 1 day ago
Comment by viraptor 1 day ago
Comment by vunderba 1 day ago
Comment by onoesworkacct 1 day ago
Comment by dvngnt_ 1 day ago
Comment by efilife 1 day ago
Comment by wahnfrieden 1 day ago
Comment by kingkawn 1 day ago
Comment by jebronie 1 day ago
Comment by varjag 1 day ago
Comment by minimaxir 1 day ago
Comment by weird-eye-issue 1 day ago
Comment by encroach 1 day ago
I like this benchmark because its based upon user votes, so overfitting is not as easy (after all, if users prefer your result, you've won).
Comment by ygouzerh 22 hours ago
Comment by nycdatasci 1 day ago
Comment by encroach 1 day ago
Comment by nycdatasci 19 hours ago
Comment by nycdatasci 15 hours ago
Comment by encroach 12 hours ago
They also control for style https://news.lmarena.ai/sentiment-control/
Comment by mingabunga 1 day ago
- Gemini/Nano did a pretty average job, only applying some grey to some of the panels. I tried a few different examples and got similar output.
- GPT did a great job and themed the whole app and made it look great. I think I'd still need a designer to finesse some things though.
Comment by rw2 23 hours ago
-The latency is still too high, lower than 10 seconds for nano banana and around 25 seconds for GPT image 1.5
-The quality is higher but not a jump like previous google models to Nano Banana Pro. Nano banana pro is still at least equivalently good or better in my opinion.
Comment by abbycurtis33 1 day ago
Comment by ianbicking 1 day ago
That is, it's nice to make a pretty stand-alone image, but without tools to maintain consistency and place them in context you can't make a project that is more than just one image, or one video, or a scattered and disconnected sequence of pieces.
Comment by Sohcahtoa82 14 hours ago
Comment by FergusArgyll 1 day ago
Comment by xnx 1 day ago
Comment by takoid 1 day ago
Comment by throwthrowuknow 1 day ago
Comment by empressplay 1 day ago
Comment by kingkawn 1 day ago
Comment by doctorpangloss 1 day ago
Better to read that particular story in the context of, "It would be very difficult to make a seed fund that is an index of all avant garde culture making because [whatever]."
Comment by password-app 1 day ago
We're seeing AI get better at both creative tasks (images) and operational tasks (clicking through websites).
For anyone building AI agents: the security model is still the hard part. Prompt injection remains unsolved even with dedicated security LLMs.
Comment by yuni_aigc 13 hours ago
Some models are very strong at sharp details and localized edits, but they can break global lighting consistency — shadows, reflections, or overall scene illumination drift in subtle ways. GPT-Image seems to trade a bit of micro-detail for better global coherence, especially in lighting, which makes composites feel more believable even if they’re not pixel-perfect.
It’s hard to capture this in benchmarks, but for real-world editing workflows it ends up mattering more than I initially expected.
Comment by chakintosh 23 hours ago
Comment by bunnybomb2 1 hour ago
Comment by anonfunction 1 day ago
POST "https://api.openai.com/v1/responses": 500 Internal Server Error {
"message": "An error occurred while processing your request. You can retry your request, or contact us through our help center at help.openai.com if the error persists. Please include the request ID req_******************* in your message.",
"type": "server_error",
"param": null,
"code": "server_error"
}
POST "https://api.openai.com/v1/responses": 400 Bad Request {
"message": "Invalid value: 'blah'. Supported values are: 'gpt-image-1' and 'gpt-image-1-mini'.",
"type": "invalid_request_error",
"param": "tools[0].model",
"code": "invalid_value"
}Comment by aziis98 1 day ago
[1]: https://chatgpt.com/share/6941c96c-c160-8005-bea6-c809e58591...
Comment by alasano 1 day ago
They even linked to their Image Playground where it's also not available..
I updated my local playground to support it and I'm just handling the 404 on the model gracefully
Comment by anonfunction 1 day ago
POST "https://api.openai.com/v1/responses": 500 Internal Server Error {
"message": "An error occurred while processing your request. You can retry your request, or contact us through our help center at help.openai.com if the error persists. Please include the request ID req_******************* in your message.",
"type": "server_error",
"param": null,
"code": "server_error"
}
Interestingly if you change to request the model foobar you get an error showing this: POST "https://api.openai.com/v1/responses": 400 Bad Request {
"message": "Invalid value: 'blah'. Supported values are: 'gpt-image-1' and 'gpt-image-1-mini'.",
"type": "invalid_request_error",
"param": "tools[0].model",
"code": "invalid_value"
}Comment by weird-eye-issue 1 day ago
Comment by minimaxir 1 day ago
Comment by joshstrange 1 day ago
It's too bad no OpenAI Engineers (or Marketers?) know that term exists. /s
I do not understand why it's so hard for them to just tell the truth. So many announcements "Available today for Plus/Pro/etc" really means "Sometime this week at best, maybe multiple weeks". I'm not asking for them to roll out faster, just communicate better.
Comment by xnx 1 day ago
What angle is there for second tier models? Could the future for OpenAI be providing a cheaper option when you don't need the best? It seems like that segment would also be dominated by the leading models.
I would imagine the future shakes out as: first class hosted models, hosted uncensored models, local models.
Comment by fock 18 hours ago
Comment by smlavine 1 day ago
Comment by WhyOhWhyQ 1 day ago
Comment by teaearlgraycold 19 hours ago
Comment by zkmon 1 day ago
So, let's simulate that future. Since no one trusts your talent in coding, art or writing, you wouldn't care to do any of these. But the economy is built on the products and services which get their value based how much of human talent and effort is required to produce them.
So, the value of these services and products goes down as demand and trust goes down. No one knows or cares who is a good programmer in the team, who is great thinker and writer and who is a modern Picasso.
So, the motivation disappears for humans. There are no achievements to target, there is no way to impress others with your talent. This should lead to uniform workforce without much difference in talents. Pretty much a robot army.
Comment by arnz-arnz 1 day ago
Comment by zkmon 1 day ago
Comment by arnz-arnz 19 hours ago
Comment by KaiserPro 1 day ago
Comment by mmh0000 1 day ago
Now it means whoever has access to uncensored/non-watermarking models can pass off their faked images as real and claim, "Look! There's no watermark, of course, it's not fake!"
Whereas, if none of the image models did watermarking, then people (should) inherently know nothing can be trusted by default.
Comment by pbmonster 21 hours ago
Add an anonymizing scheme (blind signatures or group signatures), done.
Comment by laurent123456 1 day ago
Comment by ewoodrich 1 day ago
Comment by qingcharles 23 hours ago
Comment by PhilippGille 1 day ago
It doesn't mention the new model, but it's likely the same or similar.
Comment by adrian17 1 day ago
Comment by wavemode 1 day ago
Comment by mnorris 1 day ago
$ exiftool chatgpt_image.png
...
Actions Software Agent Name : GPT-4o
Actions Digital Source Type : http://cv.iptc.org/newscodes/digitalsourcetype/trainedAlgori...
Name : jumbf manifest
Alg : sha256
Hash : (Binary data 32 bytes, use -b option to extract)
Pad : (Binary data 8 bytes, use -b option to extract)
Claim Generator Info Name : ChatGPT
...
Comment by KaiserPro 1 day ago
I suppose I'm going to have to bite the bullet and actually train an AI detector that works roughly in real time.
Comment by neom 1 day ago
Comment by sroussey 1 day ago
Still fails. Every photo of a man with half gray hair will have the other half black.
Comment by sfmike 1 day ago
Comment by gs17 1 day ago
That's still dangerously bad for the use-case they're proposing. We don't need better looking but completely wrong infographics.
Comment by astrange 1 day ago
I'd especially say like 100% of amateur political infographics/memes are wrong. ("climate change is caused by 100 companies" for instance)
Comment by rcarmo 1 day ago
Comment by surrTurr 1 day ago
Comment by ezero 1 day ago
Comment by almosthere 1 day ago
This tool is keeping my look the same.
Comment by gundmc 1 day ago
Comment by almosthere 1 day ago
Comment by mortenjorck 1 day ago
My own profile picture? Can’t edit some public figures. A famous Norman Rockwell painting from 80 years ago? Can’t edit some public figures.
Safety’d into oblivion.
Comment by andai 1 day ago
Comment by Garlef 22 hours ago
Comment by ge96 1 day ago
Comment by onoesworkacct 1 day ago
hopefully this leads to greater importance of seeing things with your own wetware.
Comment by ge96 1 day ago
Comment by ge96 15 hours ago
I'm not saying this as a critique against image generation as you can manually make these fake images but yeah
Ultimately I think it's good, makes people be real
Comment by celeryd 1 day ago
Comment by sipsi 19 hours ago
Comment by eterm 1 day ago
> In the style of a 1970s book sci-fi novel cover: A spacer walks towards the frame. In the background his spaceship crashed on an icy remote planet. The sky behind is dark and full of stars.
Nano banana pro via gemini did really well, although still way too detailed, and it then made a mess of different decades when I asked it to follow up: https://gemini.google.com/share/1902c11fd755
It's therefore really disappointing that GPT-image 1.5 did this:
https://chatgpt.com/share/6941ed28-ed80-8000-b817-b174daa922...
Completely generic, not at all like a book cover, it completely ignored that part of the prompt while it focused on the other elements.
Did it get the other details right? Sure, maybe even better, but the important part it just ignored completely.
And it's doing even worse when I try to get it to correct the mistake. It's just repeating the same thing with more "weathering".
Comment by bongodongobob 1 day ago
A professional artist wouldn't know what you want.
You didn't even specify an art style. 1970s sci-fi novel cover isn't a style. You'll find vastly different art styles from the 70s. If you're disappointed, it's because you're doing a shitty job describing what's in your head. If your prompt isn't at least a paragraph, you're going to just get random generic results.
Comment by eterm 1 day ago
Look again at Gemini's output, it looks like an actual book cover, it looks like an illustration that could be found on a book.
It takes on board corrections (albeit hilariously literaly).
Look at GPT image's output, it doesn't look anything like a book cover, and when prompted to say it got it wrong, just doubles down on what it was doing.
Comment by bongodongobob 23 hours ago
Comment by eterm 15 hours ago
Comment by bongodongobob 13 hours ago
Comment by eterm 12 hours ago
GPT Image bombed notably worse than the others, not the original picture itself, but the complete lack of recognition of my feedback that it hadn't got it right, it just doubled down on the image it had generated.
Comment by raw_anon_1111 1 day ago
Comment by fellowniusmonk 11 hours ago
Comment by GaryBluto 1 day ago
Comment by dzonga 1 day ago
impressive stuff though - as you can give it a base image + prompt.
Comment by drawnwren 1 day ago
we have the capability, we just stopped making power more abundant.
Comment by iknowstuff 1 day ago
Comment by astrange 1 day ago
Comment by v9v 1 day ago
Comment by mohsen1 1 day ago
Comment by hexage1814 1 day ago
Comment by r053bud 1 day ago
Comment by BoorishBears 1 day ago
I'm honestly surprised they're still on this post-Sora 2: let the consumer of the API determine their risk appetite. If a copyright holder comes knocking, "the API did it" isn't going to be a defense either way.
Comment by pdevr 1 day ago
Where is the image given along with the prompt? If I didn't miss it: Would have been nice to show the attached image.
Comment by taytus 1 day ago
Comment by 0dayman 1 day ago
Comment by nightshift1 1 day ago
Comment by BrokenCogs 1 day ago
Comment by bdangubic 1 day ago
Comment by BrokenCogs 1 day ago
Comment by StarterPro 1 day ago
I really hope everyone is starting to get disillusioned with OpenAI. They're just charging you more and more for what? Shitty images that are easy to sniff out?
In that case, I have a startup for you to invest in. Its a bridge-selling app.
Comment by czhu12 1 day ago
Comment by wahnfrieden 1 day ago
Comment by cheema33 1 day ago
Comment by wahnfrieden 1 day ago
Where switching will be easier is with casual chat users plus API consumers that are already using substandard models for cost efficiency. But there will also always be a market for state of art quality.
Comment by wahnfrieden 13 hours ago
As Gemini has gained competitiveness (higher confidence in its output, better reputation), its prices have steadily risen
Comment by enigma101 1 day ago
Comment by randall 1 day ago
Comment by catigula 1 day ago
Comment by Jonovono 1 day ago
Comment by BoorishBears 1 day ago
(Realistically, Seedream 4 is the best at aesthetically pleasing generation, Nano Banana Pro is the best at realism and editing, and Seedream 4.5 is a very strong middleground between the two with great pricing)
gpt-image-1.5 feels like OpenAI doing the bare minimum to keep people from switching to Gemini every time they want an image.
Comment by ChrisArchitect 1 day ago
Comment by dang 1 day ago
Comment by ares623 1 day ago
Comment by famahar 1 day ago
Comment by Forgeties79 1 day ago
Comment by ares623 1 day ago
Comment by Forgeties79 18 hours ago
Comment by gostsamo 1 day ago
Comment by thumbsup-_- 1 day ago
Comment by augustk 18 hours ago
Comment by brador 1 day ago
Not even one. And no one on the team said anything?
Come on Sam, do better.
Comment by hamonrye 1 day ago
Comment by kitsune1 1 day ago
Comment by gpt-image 1 day ago
Comment by youknow123 1 day ago
Comment by animanoir 18 hours ago
Comment by nycdatasci 1 day ago
Comment by rvz 1 day ago
Comment by moralestapia 1 day ago
Comment by koakuma-chan 1 day ago
Comment by adammarples 1 day ago
Two women walking in single file
Although it tried very hard and had them staggered slightly
Comment by weird-eye-issue 23 hours ago
Comment by jdthedisciple 1 day ago
Aren't we plagued enough by all the fake bullshit out there.
Ffs!
/rant
Sorry gotta be honest and blunt every one of those times...