Google has released Gemma 4 this time (III)
While browsing the forum this time, what struck me most wasn’t which company released another leaderboard, but a very basic statement: “Not enough VRAM; no matter how large the parameters are, it’s useless.”
Previously, I always understood “slow model” as a computational power issue. However, the more I read, the clearer it became that often, the problem isn’t that the GPU can’t compute it, but rather that the data cannot reside in the right place. Just by changing the memory path, the token speed doesn’t just slow down; it drops drastically.