Categories

127 pages

Computer

Google released Gemma 4 this time (Part II)

If you only look at the leaderboard, 31B is definitely the most eye-catching. But when you actually get the machine out, it’s still that un-upgraded RTX 3060 12GB, and your judgment will change immediately. How should I put it? For local deployment, in the end, it’s not about who looks the fanciest, but who seems like the one you can live with long-term. For me, what is truly worth running first this time isn’t 31B, but 26B A4B.

Google has released Gemma 4 this time (Part 1)

On the day of the initial release, what I originally wanted to do was simple: find an upgraded version corresponding to Gemma 3 and download it to run. However, after looking around, I was a bit stunned. The familiar naming convention of 4B / 12B / 27B is gone; instead, we have E4B, 26B A4B, and 31B. How should I put it? This time, what Google truly changed wasn’t just the model sizes, but even “how you should understand this batch of models.”

Codex defaults to medium, but I later switched to high.

During my time using Codex, there was one thing that always felt a bit awkward: the default thinking level is medium, but when chatting online about GPT-5.4, everyone’s tone is very strong. When it comes down to actually using it, what exactly is the difference between medium, high, and xhigh? The official documentation hasn’t provided a particularly straightforward chart. My current conclusion is quite clear: for daily coding, I prefer to start directly at high. Medium isn’t unusable; it’s fine for quick tasks, minor tweaks, or exploring directions. But when dealing with multi-file modifications, ambiguous requirements, and needing to judge while looking at code, medium easily wastes computational power in the wrong places. I actually don’t use xhigh often; I save it for really difficult tasks where I get stuck.

Writing an AI blog post, in the end, still needs to be turned into engineering (Part 3)

After going through all the configurations in the repository, I am even more certain about one thing: what matters in the end is not how strong any single model is, but rather who should bear the cost at each layer.

The most obvious signal is that the currently active published.runtime.json is still the one generated on April 2, 2026, for minimax-m2, yet the entry from April 3, 2026, at 16:38, labeled 5f17088, has switched the default provider for blog-style-suite to the local gemma-4-26b-a4b in LM Studio. This might look inconsistent, but it actually isn’t; it precisely illustrates that this pipeline has begun to specialize.

Making the "AI writes blog" thing into an engineering project later (Part II)

If there are enough tokens, the least effort method is actually quite crude: just feed the model historical articles and let it learn on its own. The problem with this method is that it only suits occasional writing, not continuous work. If you treat blogging as a long-term workflow, relying solely on raw historical articles will quickly go from “simple and direct” to “expensive and messy.”

AI Writing a Blog: The Next Steps Towards Engineering (Part 1)

I wrote quite a few AI articles last year. The most basic workflow back then was: first, organize an outline or a list of questions myself; let the large model spit out the main body text; then copy the content into a local md document, add frontmatter, tags, categories, and titles, and finally publish it. This process isn’t unusable, but it’s tedious. The part that really wastes time isn’t the main body text, but the repetitive labor surrounding it. Especially after using Codex a lot recently, this awkwardness has become even stronger. It can read repositories, modify files, supplement materials, and even write articles directly into the directory. If I still have to copy and paste things manually, it feels like I’m tying down the tool’s legs.

In the era of AI, just getting people into an app is no longer enough.

Seeing the domestic AI companies spend money during this Lunar New Year, my first reaction wasn’t excitement, but familiarity. Tencent Yuanbao gave out a 1 billion cash red envelope on February 1st; Baidu Wenxin distributed red envelopes totaling 500 million from January 26th until mid-March; Alibaba’s Qwen launched a “treat plan” of 3 billion on February 6th; and Doubao leveraged the Spring Festival Gala for AI interactions to push its presence. My judgment is straightforward: this is still an inertial action left over from the previous era of the internet—first, pull people into the App, and second, build up usage frequency; everything else can wait. But the business of AI isn’t quite like a traffic-driven business.

Skill is not a new prompt, it is the job manual for the agent.

These past few days, while reading about AI programming, people were first discussing MCP, and then immediately started talking about Skill. Many people who see this term for the first time will instinctively treat it as another new protocol or another advanced prompt.

My judgment is very straightforward: Skill isn’t here to replace MCP; rather, it’s more like providing an occupational manual for the agent. MCP solves the problem of “enabling the agent to connect to the external world,” while Skill solves the problem of “how to reliably get the job done after connecting.” These two are not a replacement relationship; they are more like one following the other.

Simply put, MCP gives the agent hands and feet, and Skill tells the agent not to mess around.