4 pages

Gemma

Google has released Gemma 4 this time (III)

Apr 08, 2026

While browsing the forum this time, what struck me most wasn’t which company released another leaderboard, but a very basic statement: “Not enough VRAM; no matter how large the parameters are, it’s useless.”

Previously, I always understood “slow model” as a computational power issue. However, the more I read, the clearer it became that often, the problem isn’t that the GPU can’t compute it, but rather that the data cannot reside in the right place. Just by changing the memory path, the token speed doesn’t just slow down; it drops drastically.

Google released Gemma 4 this time (Part II)

Apr 08, 2026

If you only look at the leaderboard, 31B is definitely the most eye-catching. But when you actually get the machine out, it’s still that un-upgraded RTX 3060 12GB, and your judgment will change immediately. How should I put it? For local deployment, in the end, it’s not about who looks the fanciest, but who seems like the one you can live with long-term. For me, what is truly worth running first this time isn’t 31B, but 26B A4B.

Google has released Gemma 4 this time (Part 1)

Apr 08, 2026

On the day of the initial release, what I originally wanted to do was simple: find an upgraded version corresponding to Gemma 3 and download it to run. However, after looking around, I was a bit stunned. The familiar naming convention of 4B / 12B / 27B is gone; instead, we have E4B, 26B A4B, and 31B. How should I put it? This time, what Google truly changed wasn’t just the model sizes, but even “how you should understand this batch of models.”

Don't force weak models onto hard tasks.

Apr 02, 2026

Recently, I’ve been migrating some edge cases to MiniMax and local models. The more I use them, the more I feel that we shouldn’t always measure things by the standard of “the most powerful model.”

My judgment is straightforward: don’t force weak models into hard tasks. Models like MiniMax are indeed limited in capability, but for complex coding, long-chain reasoning, or ambiguous requirement decomposition, they fall a bit short. However, if you ask it to do data cleaning, document writing, or searching for proposal materials—these kinds of tasks—it can handle them perfectly well. The same logic applies to local models around the 12B size; translation, format rewriting, and batch cleaning are actually where they are best suited.

To put it plainly, it’s not that the models lack value; it’s just that we shouldn’t place them in the wrong roles.

A financial IT programmer's tinkering and daily life musings
Built with Hugo
Theme Stack designed by Jimmy