Categories

127 pages

Computer

Don't force weak models onto hard tasks.

Recently, I’ve been migrating some edge cases to MiniMax and local models. The more I use them, the more I feel that we shouldn’t always measure things by the standard of “the most powerful model.”

My judgment is straightforward: don’t force weak models into hard tasks. Models like MiniMax are indeed limited in capability, but for complex coding, long-chain reasoning, or ambiguous requirement decomposition, they fall a bit short. However, if you ask it to do data cleaning, document writing, or searching for proposal materials—these kinds of tasks—it can handle them perfectly well. The same logic applies to local models around the 12B size; translation, format rewriting, and batch cleaning are actually where they are best suited.

To put it plainly, it’s not that the models lack value; it’s just that we shouldn’t place them in the wrong roles.

After reviewing AI articles from the past two years, I think these are the 8 topics I should write about next.

I recently went back and reviewed the articles in my blog related to AI from the past two years, and I found that the content is no longer just simple experiences like “whether a certain model is good or not.” Instead, it has gradually formed a relatively clear main thread: How AI truly entered my development workflow, and what efficiency gains, costs, and new constraints it brought.

The End of Low-Cost API Gateways: Large Model Experiences and the Impossible Triangle in March

Throughout March, I was constantly testing between various large model API hubs. It is indeed cheap. You can test out foreign models like ChatGPT, Claude, and Gemini for a small amount of money per month, which at first glance seems like finding an extremely cost-effective solution. However, after actually using it, I increasingly feel that this path has always been constrained by an impossible triangle: Quality, Stability, and Affordability—it is difficult for all three to be achieved simultaneously. By last weekend, the situation became quite clear. During the two days from 2026-03-28 to 2026-03-29, I felt a noticeable tightening of risk controls on ChatGPT channels, and Claude was no different. Many low-cost relays that were previously usable suddenly became unstable or even completely failed. For me, this basically signaled the temporary end of the low-cost API relay model.

Computing Power Hegemony and Valuation “Bubble”: We are entering a costly new era.

Recently, I’ve been observing discussions within the industry, and it seems there’s been a fundamental shift in the definition of “growth.”

Previously, when we discussed the internet, we talked about “four ounces moving a thousand pounds” – writing a few lines of code, renting a few cloud servers, and leveraging excellent interaction and operations to unlock hundreds of millions of users. However, as of 2026, this “low-asset” illusion is being completely shattered by large models.

A long period of heavy AI programming

Recently, in the project, there has been heavy use of AI programming, which should be the most integrated AI in work over the past three years. The notes taken were not systematic; whatever came to mind was recorded.

Background

Linux environment, backend service development, without involving any UI or frontend content.

Models

I’ve tried out the three “Big Three” in China – minimax, glm, and kimi – and kimi has performed best. claude effectively handles large requests by breaking them down, while codex is most suitable for production environments; it’s exceptionally cautious.

Deep Dive: Memory Corruption and Cache Pollution in C++ with Static Lambdas

This article analyzes the bizarre phenomenon in C++ development where unordered_map::find returns an object with mismatched fields after a hit. The root cause lies in defining a static lambda within the function and using reference capture to capture local variables, leading to a dangling reference after the first call, triggering undefined behavior (UB) and polluting cache data in subsequent calls. It is recommended to address this issue by explicitly passing parameters instead of implicit capture, managing lifecycles properly, and utilizing Sanitizer tools.

wrk vs. JMeter deep benchmarking

In internet system stress testing, we frequently encounter two tools with vastly different styles: one is extremely lightweight, pursuing extreme throughput—wrk; the other is feature-rich and simulates real business flows—JMeter.

Prompt: Outline the core ideas and write a科普 article (explanatory article): HTTP stress testing tools, wrk vs JMeter – what are the differences? What I know, wrk tends to use one thread with multiple connections for testing, while JMeter primarily employs a short connection mode, which can be adjusted via configuration to enable long polling.

Detailed Explanation of How Passkeys Work and Their Future

Background introduction: Tencent’s CNB platform only supports WeChat login, with no conventional email account method, leading to several guys in the group complaining about it daily – it’s getting annoying. The Tencent product manager came up with a compromise solution: support Passkey login.

Every day, we are repeating a dangerous action: entering passwords. Despite our complex rules (uppercase, special symbols, numbers), data breaches, phishing attacks, and the frustration of “forgotten password” issues continue to trouble everyone.

Tech giants (Apple, Google, Microsoft) along with the FIDO Alliance have provided the ultimate solution: Passkey (Passcode Key). It’s not just a “replacement” for passwords; it completely “eliminates” them.

The login process has shifted from verifying passwords to verifying the current device’s trustworthiness.