ChatGPT Images 2.0 is very powerful. Can we still trust [it] after taking a screenshot? / Is it credible just by looking at a screenshot?

Sat, 25 Apr 2026 00:38:42 +0800

Initially, I didn’t actually plan on testing it. When I came across the news that OpenAI was releasing ChatGPT Images 2.0 on April 21, 2026, my first reaction was just “another image version update.” However, when I checked the Artificial Analysis leaderboard and saw that GPT Image 2 (high) ranked first for text-to-image generation with an Elo of 1332, I felt a bit compelled to test it anyway.

The results are quite impressive; the Chinese output is excellent, it can handle comics, and character/narrative consistency across multiple continuous images has also improved. However, as I tested it further, I felt that what is truly worth discussing this time isn’t “it draws better,” but rather “it starts making things that were previously taken as default truths seem unreliable.” This subject matter is more complicated than a simple leaderboard ranking.

Where is its strength? / What is its strength? (Literal translation)

or more naturally:

Its Key Strengths

(Since “先说它到底强在哪” is an introduction to discussing strengths, a heading like this is appropriate.)

OpenAI’s official name this time is ChatGPT Images 2.0. The one corresponding in the leaderboard/list is GPT Image 2 (high). Don’t get them mixed up. The official introduction states very clearly that this generation isn’t just about improved image quality; it also incorporates a thinking mode, allowing it to connect to live web search, generate multiple images from a single prompt, and even refine a very rough prompt into a final image that looks thoroughly researched and well-thought-out.

This change is extremely noticeable when applied to Chinese scenarios. Previously, generated images with Chinese weren’t always unusable, but often required extensive and repeated retouching. The characters would be misplaced, the intended meaning would drift, and the style would be inconsistent throughout—comic panel layouts were particularly troublesome. After testing it, I found that the compliance of Chinese prompts has improved significantly. When creating comic pages, character concept sheets, or continuous narrative content, we are no longer dealing with “occasional successes,” but rather reaching a genuinely usable range.

The official examples are also very straightforward, showcasing everything from Japanese comic pages and multilingual layouts to continuous multi-page narratives and dense textual infographics. To be honest, this isn’t a version that offers “small incremental improvements”; it’s a version where the workflow itself transforms. Many things that previously required ten rounds of revision can now achieve 70-80% completion in just one round.

The trouble isn’t that it can’t draw; it’s that it draws too well

But the limitations have not disappeared.

Through my own experiments, one very noticeable feeling is that the platform still has boundaries. If you try to prompt for specific styles, IPs, or sensitive scenarios involving real people, the system won’t just provide everything readily. OpenAI’s system card actually explains the reasons clearly: this generation, because of its stronger sense of realism, would more easily produce highly convincing deepfake content without additional safeguards, especially concerning real people, real locations, or real events.

So, the weirdest/most contradictory part of this generation is right here.

On one hand, it is significantly better than before—so much so that you can’t help but keep testing it, continuing to nitpick the details, and trying to compensate for things that were impossible before.

On the other hand, it also needs to be stricter. So strict that for certain requirements, if you are still thinking of “bypassing” them through prompts, fundamentally, you are no longer using a tool for creation; you are wrestling with the content governance system itself.

I won’t elaborate on the copyright legal details for this part; it’s too easy to get sidetracked. My judgment is simple: the boundary still exists, and in the future, it will only become more like boundaries in real society—it won’t automatically vanish just because the model gets stronger.

Will Screenshots Be Trustworthy in the Future?

This is where I was genuinely stumped later on. / This was the part that truly amazed/confused me later.

Previously, people were more concerned that AI images had too strong an “AI feel”—that they looked fake at first glance and could only fool those who weren’t paying attention. Now, the issue is different: many images are no longer questioned based on whether they look like AI drawings, but rather whether they look like screenshots you actually saw in a group chat yesterday.

Receipts, chat logs, transfer pages, product dashboards, order interfaces—these items are not inherently artistic creations; their visual structures are very fixed, and their information density is low. For image models, this is conversely another target suitable for fabrication. As long as text stability, interface consistency, and local realism are maintained, creating a screenshot that “looks sufficiently authentic” has lowered the bar significantly.

OpenAI mentioned two things in the system card: continuing to include C2PA source metadata, and adding an invisible watermark. This direction is certainly correct; neglecting it would be insufficient. However, the problem remains very real: the images that ordinary people actually encounter are often not the original ones.

It undergoes forwarding through WeChat, Weibo, Xiaohongshu, Moments, group chat sharing, cropping, compression, and secondary saving. After completing this cycle, whether the metadata persists, who will investigate it, or if the platform allows access to it—it is basically all question marks.

In other words, while the technical sphere has begun to supplement “source proof,” the social sphere remains caught in the outdated habit of believing that “if you see it, it must be true” (or “seeing is believing”). This disparity—this gap—is what I find most troublesome.

Going forward, a single screenshot will likely only count as a clue, not inherent evidence. If we want to be truly rigorous, we must review the original files, export records, timelines, context, and whether it can be cross-verified. The probative value of a single image is being gradually undermined by model capabilities.

What we come to believe in the future may not be pictures, but rather chains of evidence

So, I feel a little conflicted about these types of models right now.

It is genuinely powerful, and it’s not empty hype. Looking at the rankings as of April 25, 2026, GPT Image 2 (high) is already in first place. The Chinese language support is good; it can handle comics/manga. Multi-image continuity and text control have all improved—these enhancements are real. For creators, operators, designers, and content makers, this is pure productivity.

But the other side of the same coin is also true.

As “looks very real” images become cheap and abundant, society can no longer rely on the old low-cost mechanisms of trust. Previously, we defaulted to assuming that screenshots were likely genuine; going forward, this default assumption must be lowered. This is especially true for items such as receipts, WeChat conversations, payment pages, and order records. I believe that in the future, we must always ask: What is the original source? And can it be independently verified?

How should I put it? Previously, the problem with AI images was that they weren’t realistic enough. Now, the problem is that they are starting to become too real.

This might be the threshold that ChatGPT Images 2.0 genuinely crossed. While its capability is undoubtedly powerful, what’s more problematic is that it has also brought into sharp focus the question of whether one can still trust a screenshot.

References

Writing Notes

Original Prompt

ChatGPT released Image 2. I saw related news while scrolling, so at first I didn’t plan to test it. But when I checked the leaderboard, I realized how truly far ahead it is. When I tried it out, the results for Chinese were very good. It can create comics and has added reasoning capabilities, allowing it to output multiple continuous images at once. There are still copyright issues; it cannot directly generate anime-style pictures, so techniques must be used to bypass this. At first, I was only impressed by how strong this version’s image generation ability is, but later I realized that if it is too realistic, that doesn’t work either—it makes people lose trust in screenshots. Receipts or WeChat screenshots—can we still believe them in the future?

Summary of Writing Ideas

Focus the opening on a personal trigger point—“originally didn’t intend to test this, but tried it after seeing the ranking list”—rather than starting with a product announcement.
In the main body, first confirm where this generation’s capabilities are strongest. Focus on Chinese language support, multi-image continuity, and

Image Generation on Uncle Xiang's Notebook