ChatGPT Images 2.0

Posted by meetpateltech 3 hours ago

Comments

Comment by ea016 2 hours ago

Price comparison:

GPT Image 2

  Low     : 1024×1024 $0.006 | 1024×1536 $0.005 | 1536×1024 $0.005

  Medium  : 1024×1024 $0.053 | 1024×1536 $0.041 | 1536×1024 $0.041

  High    : 1024×1024 $0.211 | 1024×1536 $0.165 | 1536×1024 $0.165

GPT Image 1

  Low     : 1024×1024 $0.011 | 1024×1536 $0.016 | 1536×1024 $0.016

  Medium  : 1024×1024 $0.042 | 1024×1536 $0.063 | 1536×1024 $0.063

  High    : 1024×1024 $0.167 | 1024×1536 $0.25  | 1536×1024 $0.25

Comment by Melatonic 1 hour ago

Weird that they restrict the resolution so much. Does it fall apart with more detail (when zoomed in) or does the cost just skyrocket?

Comment by vunderba 1 hour ago

It's usually based on what they've been trained on. There aren't very many models that'll do higher resolutions outside of Seedream but adherency is worse.

Comment by 31 minutes ago

Comment by kibibu 42 minutes ago

Genuine question: what positive use cases are sufficient to accept the harm from image generators?

One that i can think of:

- replacing photography of people who may be unable to consent or for whom it may be traumatic to revisit photographs and suitable models may not be available, e.g. dementia patients, babies, examples of medical conditions.

Most other vaguely positive use cases boil down to "look what image generators can do", with very little "here's how image generators are necessary for society.

On the flip side, there are hundreds of ways that these tools cause genuine harm, not just to individuals but to entire systems.

Comment by throwaway2027 2 hours ago

I know people like to dunk on ChatGPT and Gemini and say Claude is or used to be better, but you can still use worse models when you're out of usage AND make use of Nano Banana and and ChatGPT Image generation with separate limits for your subscription. I think it could make it a more package as a whole for some people (non-programmers). I do like having the option and am excited for which improvements they've done to ChatGPT Image generation because in the past it had this yellow piss filter and 1.5 it sort of fixed it but made things really generic with Nano Banana beating it (altough Gemini also had a too aggressively tuned racial bias which they fixed), it seems the images ChatGPT generates have gotten better.

Comment by joegibbs 54 minutes ago

The quality of the text is really impressive and I can’t seem to see any artefacts at all. The fake desktop is particularly good: Nano Banana would definitely slip up with at least a few bits of the background.

Comment by louiereederson 2 hours ago

The image of the messy desktop with the ASCII art is so impressive - the text renders, the date is consistent, it actually generated ASCII art in "ChatGPT", etc. I was skeptical that it was cherry-picked but was able to generate something very similar and then edit particular parts on the desktop (i.e. fixing content in the browser window and making the ASCII dog "more dog like"). It's honestly astounding, to me at least.

Comment by 6thbit 2 hours ago

System card link with safety details https://deploymentsafety.openai.com/chatgpt-images-2-0

direct pdf https://deploymentsafety.openai.com/chatgpt-images-2-0/chatg...

Comment by samiwami 2 hours ago

do they have anything similar to SynthID, or are they just pretending that problem doesn't exist?

I know this is probably mega cherry-picked to look more impressive, but some of the images are terrifyingly realistic. They seem to have put a lot of effort into the lighting.

Comment by alextheparrot 2 hours ago

> Integrating an imperceptible, robust, and content-specific watermark

From the system card someone linked elsewhere in the discussion

Comment by ai-tamer 1 hour ago

[dead]

Comment by Legend2440 2 hours ago

I think we are just going to have to accept that realistic images can be easily fabricated now.

Seeing is not believing anymore, and I don't think SynthID or anything like it can restore that trust in images.

Comment by vunderba 1 hour ago

OpenAI’s gpt-image-1.5 and Google’s NB2 have been pretty much neck and neck on my comparison site which focuses heavily on prompt adherence, with both hovering around a 70% success rate on the prompts for generative and editing capabilities. With the caveat being that Gemini has always had the edge in terms of visual fidelity.

That being said, gpt-image-1.5 was a big leap in visual quality for OpenAI and eliminated most of the classic issues of its predecessor, including things like the “piss filter.”

I’ll update this comment once I’ve finished running gpt-image-2 through both the generative and editing comparison charts on GenAI Showdown.

Since the advent of NB, I’ve had to ratchet up the difficulty of the prompts especially in the text-to-image section. The best models now score around 70%, successfully completing 11 out of 15 prompts.

For reference, here’s a comparison of ByteDance, Google, and OpenAI on editing performance:

https://genai-showdown.specr.net/image-editing?models=nbp3,s...

And here’s the same comparison for generative performance:

https://genai-showdown.specr.net/?models=s4,nbp3,g15

UPDATES:

gpt-image-2 has already managed to overcome one of the so‑called “model killers” on the test suite: the nine-pointed star.

Results are in for the generative (text to image) capabilities: Gpt-image-2 scored 12 out of 15 on the text-to-image benchmark, edging out the previous best models by a single point. It still fails on the following prompts:

- A photo of a brightly colored coral snake but with the bands of color red, blue, green, purple, and yellow repeated in that exact order.

- A twenty-sided die (D20) with the first twenty prime numbers (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71) on the faces.

- A flat earth-like planet which resembles a flat disc is overpopulated with people. The people are densely packed together such that they are spilling over the edges of the planet. Cheap "coastal" real estate property available.

All Models:

https://genai-showdown.specr.net

Just Gpt-Image-1.5, Gpt-Image-2, Nano-Banana 2, and Seedream 4.0

https://genai-showdown.specr.net?models=s4,nbp3,g15,g2

Comment by Melatonic 1 hour ago

We were afraid it would be Skynet and instead we got the ultimate meme generator !

Comment by throw310822 1 hour ago

Ok, I can hear the sound of entire industries crumbling right now.

Comment by thevinter 2 hours ago

Every time a new image gen comes out I keep saying that it won't get better just to be surprised again and again. Some of the examples are incredible (and incredibly scary. I feel like this is truly the point where understanding if something is AI becomes impossible)

Comment by lehmacdj 2 hours ago

So do you think there will be a better image model in a year?

Comment by throw310822 1 hour ago

I'll bite: no I don't think so. If the examples are not cherry-picked and by "image model" we mean just the ability to generate pictures, this looks like parity with human excellence, there isn't much space for further improvement. The images don't just look real, they look tasteful- the model is not just generating a credible image, it's generating one that shows the talent of a good photographer/ designer/ artist.

Comment by Vachyas 1 hour ago

I'm honestly unsure what could be improved at this point.

Consistency? So it fails less often?

Based on the released images, (especially the one "screenshot" of the Mac desktop) I feel like the best images from this model are so visually flawless that the only way to tell they're fake is by reasoning about the content of the image itself (ex. "Apple never made a red iPhone 15, so this image is probably fake" or "Costco prices never end in .96 so this image is probably fake")

Comment by RobinL 1 hour ago

I'm been impressed when testing this model today, but it still can't consistently adhere to the following prompt: make me an image of a pizza split into 10 equal slices with space in between the them, to help teach fractions to a child.

It doesn't reliably give you 10 slices, even if you ask it to number them. None of the frontier models seem to be able to get this right

Comment by thevinter 1 hour ago

There is definitely room for improvement: https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a...

Especially when it comes to detailed outputs or non-standard prompts.

I do believe it will get even better - not sure it will happen within a year but I wouldn't be incredibly surprised if it did.

Comment by vunderba 1 hour ago

Yep. “Where’s Waldo” has been a classic challenge for generative models for a while because it requires understanding the entire concept (there’s only one Waldo), while also holding up to scrutiny when you examine any individual, ordinary figure.

I experimented with the concept of procedural generation of Waldo-style scavenger images with Flux models with rather disappointing results. (unsurprisingly).

Comment by throw310822 1 hour ago

I wonder if at this point you could just ask the agent to iteratively refine the image in smaller portions.

Comment by jinushaun 48 minutes ago

Cost? Speed?

Comment by 1 hour ago

Comment by minimaxir 3 hours ago

Model card for the API endpoint gpt-image-2 (which may or may not reflect the output from ChatGPT Images 2): https://developers.openai.com/api/docs/models/gpt-image-2

API Pricing is mostly unchanged from gpt-image-1.5, the output price is slightly lower: https://developers.openai.com/api/docs/pricing

...buuuuuuuuut the price per image has changed. For a high quality image generation the 1024x1024 price has increased? That doesn't make sense that a 1024x1024 is cheaper than a 1024x1536, so assuming a typo: https://developers.openai.com/api/docs/guides/image-generati...

The submitted page is annoyingly uninformative, but from the livestream it proports the same exact features as Gemini's Nano Banana Pro. I'll run it through my tests once I figure out how to access it.

Comment by strongpigeon 1 hour ago

> That doesn't make sense that a 1024x1024 is cheaper than a 1024x1536, [...]

I think you meant more expensive, right? Because it would make sense for it to be cheaper as there are less pixels.

Comment by ieie3366 1 hour ago

It's great. Also doesn't seem to have any "slop" standard look, the images it produces are quite diverse.

I would imagine this will hit illustrators / graphics designers / similar people very hard, now that anyone can just generate professional looking graphical content for pennies on the dollar.

Comment by retrac98 1 hour ago

The page keeps crashing on my iPhone 17 Pro.

Comment by Bennettheyn 1 hour ago

fal has the endpoint under openai/gpt-image-2

Comment by ChrisArchitect 1 hour ago

Fake layouts, fake handwritten kid story, fake drunk photos? All from training on real things people did.

As with anything AI, we are not ready for the scale of impact. And for what? Like, why are you proud of this?