Qwen on Uncle Xiang's Notebook

In the era of AI, just getting people into an app is no longer enough.

Fri, 03 Apr 2026 19:44:10 +0800

Seeing the domestic AI companies spend money during this Lunar New Year, my first reaction wasn’t excitement, but familiarity. Tencent Yuanbao gave out a 1 billion cash red envelope on February 1st; Baidu Wenxin distributed red envelopes totaling 500 million from January 26th until mid-March; Alibaba’s Qwen launched a “treat plan” of 3 billion on February 6th; and Doubao leveraged the Spring Festival Gala for AI interactions to push its presence. My judgment is straightforward: this is still an inertial action left over from the previous era of the internet—first, pull people into the App, and second, build up usage frequency; everything else can wait. But the business of AI isn’t quite like a traffic-driven business.

This approach is too much like the old internet

The logic of growth in the old internet was simple. Users arrive, they stay for a few more minutes, the app ranks higher, and traffic follows. With traffic, ads can be sold, and fundraising stories are easier to tell. During high-frequency social periods like Chinese New Year, it’s naturally a user acquisition window, so it’s normal for big tech companies to think of red envelopes, subsidies, and freebies at this time. This wave of AI competition this year follows the same underlying logic. The Paper mentioned that several large tech companies’ AI marketing budgets during the Chinese New Year period have easily exceeded 5 billion yuan. QuestMobile’s data is also very clear: on the first day of the “3 Billion Freebie” campaign by Qianwen on February 6th, the DAU directly increased by 7.3 times, reaching 58.48 million; and on the first day of the red envelope activity by Yuanbao on February 1st, the DAU also increased by 2.1 times. So, do you think throwing money around is useful? Of course it is. At least it can get a large number of people who wouldn’t normally open AI to click in and try it out, breaking down the psychological barrier of “I didn’t know I could use this thing.” That’s not the problem. The problem is: after bringing people in, how will AI make money?

The AI Ledger Isn’t Just a Traffic Count

This is what I feel many people haven’t fully grasped yet. Short videos, news feeds, and general utility software have very low marginal costs. If one more person spends a few minutes scrolling, the server load will increase, of course, but overall it still operates on the traditional software cost model. AI is different. Every extra question a user asks, every additional search run, and every piece of code, document, or image generated comes with real inference costs behind it. Electricity, GPUs, cooling, data centers, model research, distillation, training, and inference clusters—when these things stack up, both the fixed and ongoing costs are much higher than software services from previous eras. On March 23rd, Liu Liehong, Director of the National Data Bureau, disclosed that China’s average daily Token consumption has exceeded 140 trillion. This number is important not just because it’s large, but because it lays bare the commercial reality of AI: every time a user “casually asks something,” compute power is being consumed. What’s even more interesting is that official sources have framed Tokens as the “value anchor” and “settlement unit” of the intelligent era. I think this point is crucial. It directly points out that AI is not a business model purely reliant on monetizing attention; it is inherently closer to a business settled by results, by calls, or by deliverables. This is also why people are simultaneously spending money while desperately trying to control costs, implement tiers, set limits, and offer free quotas. Alibaba Cloud is still offering free trials for AI products, with pages explicitly stating “Million Free Call Quotas” and “Free Experience of Large Model Tokens exceeding 70 Billion.” On the surface, it looks like they are building habits, but in reality, they are using subsidies to exchange for future paid conversions. The problem is that if users come in just to claim red envelopes, chat a bit, or create a few stickers, the commercial value of this traffic cannot sustain the underlying compute bill.

AI is More Like a Value Economy

I’m increasingly feeling that the AI era resembles a value economy rather than an attention economy. The attention economy sells dwell time, while the value economy sells results. The difference between the two is significant. Why are users willing to pay for AI? Not because it can chat, not because its app is aesthetically pleasing, and not because it sent me some red envelopes during the Spring Festival. What truly makes people open their wallets is when it does things for them, and the cost savings compared to hiring someone else are evident. For example, writing a piece of code. Before, you had to pore over API documentation, manually assemble parameters, and stumble through several pitfalls yourself. Now, AI can take care of a large portion of that manual labor—even taking away the initial draft, error debugging, and documentation cleanup all at once. As long as the result is acceptable, many people are willing to pay for it. Because this isn’t “playing with me”; this is “doing work for me.” Take Qianwen’s strategy during this Spring Festival. Although it superficially appears to be a subsidy, there is something correct about it: it is not content with just having you chat more; it aims to push AI toward being “capable of performing tasks.” Actions like placing orders, ordering takeout, shopping, or navigation—once these are genuinely connected, the business logic is much more solid than mere chatting. Because it starts approaching delivery. Simply put, where AI truly holds value is not in trapping people within a chat box, but in transforming needs into results.

Having you write less code for an afternoon is valuable.
Having you perform one round of spreadsheet cleaning is valuable.
Having you save me from hiring a freelancer, or saving me a day of troubleshooting time, is also valuable. These types of value are tangible, quantifiable, and are the areas where users are most receptive to paying.

Red envelopes can only bring people to the door.

Therefore, my view on this round of money-throwing by domestic AI companies during the Spring Festival is neither pessimistic nor overly excited. It is useful, but it is just an appetizer. Red envelopes, free meals, and CCTV Gala exposure—these things solve the problem of “getting people to try it first.” However, in the business of AI, what ultimately matters is not who can trick users into coming in once, but who can consistently see the process through, and do it faster, cheaper, and more reliably than human labor. If we continue following the old internet playbook—only focusing on download volume, time spent, or festive DAU—we will likely find that AI’s cost structure fundamentally does not support such a brute-force approach. Conversely, if a model can genuinely help people write code, organize documents, conduct searches, run workflows, and make consumption decisions reliably, users won’t just stay; they will be willing to pay. Because the money spent is no longer about “how long I stayed in the App,” but rather “how much time you saved me, how much labor you saved, and how many things you accomplished for me.” This, to my understanding, is AI commercialization. It’s not about pulling people into an App. It’s about making users feel that the money spent was worth it.

References

Writing Notes

Original Prompt

Prompt: During the Spring Festival, domestic vendors are spending money to cultivate user habits, encouraging users to use AI for anything, just to get them using it. In my view, this is an inertial mindset. The internet of the previous era ran on the attention economy. When a user came, and they stayed within the app, I gained traffic, which allowed me to raise funds, and I could even sell the users’ attention through advertising. In the AI era, two points are different: cost—electricity + GPU + cooling + model development. These fixed costs far exceed those of previous software services. The AI era is more about a value economy. A direct manifestation of this is that if you need to write a piece of code, AI can do it for me, which is more cost-effective than hiring someone. Users are willing to pay for AI.

Summary of Writing Ideas

Retained the core judgment that “giving out money during Chinese New Year is an old internet habit,” rather than writing a news summary about the “Spring Festival AI War.”
Used the subsidy actions of Tencent, Baidu, and Alibaba during the 2026 Spring Festival to provide concrete factual anchors for the concept of “cultivating spending habits through handouts.”
Shifted the focus of the argument from traffic and downloads to Tokens, inference costs, computing power bills, and settlement units.
Used the example of “writing code is more cost-effective than hiring people” to ground the abstract commercialization problem back to whether users are willing to pay.
Structurally, first explained why this approach feels familiar, then argued why AI’s cost structure does not support a business model based solely on traffic, and finally concluded with the necessity of “paying for results.”

Don't force weak models onto hard tasks.

Thu, 02 Apr 2026 22:05:00 +0800

Recently, I’ve been migrating some edge cases to MiniMax and local models. The more I use them, the more I feel that we shouldn’t always measure things by the standard of “the most powerful model.”

My judgment is straightforward: don’t force weak models into hard tasks. Models like MiniMax are indeed limited in capability, but for complex coding, long-chain reasoning, or ambiguous requirement decomposition, they fall a bit short. However, if you ask it to do data cleaning, document writing, or searching for proposal materials—these kinds of tasks—it can handle them perfectly well. The same logic applies to local models around the 12B size; translation, format rewriting, and batch cleaning are actually where they are best suited.

To put it plainly, it’s not that the models lack value; it’s just that we shouldn’t place them in the wrong roles.

The real problem isn’t how strong the model is, but whether it works correctly.

Many people who talk about large models automatically think of the most difficult tasks.

Writing complex engineering code independently
Deconstructing an entire system in one go
Multi-turn reasoning over long contexts
Planning and executing while searching These are certainly important. But in real-world work, what is actually piled on your desk most often isn’t these kinds of tasks. It’s more like:
Cleaning up a pile of dirty fields
Organizing scattered information into readable documents
Converting long texts into summaries, FAQs, or outlines
Standardizing mixed Chinese and English content formats
Gathering data from multiple web pages and then compiling it into a draft proposal For these types of tasks, what is most needed is not “the model thinking like a genius,” but three things:
Instruction following must be reasonably accurate.
Output structure should be as stable as possible.
The cost must be low enough that you are willing to use it repeatedly. This is why I always feel that weak models are not useless; they just cannot be used in the same kind of battle as flagship models.

MiniMax: What’s Actually Suitable for It

First, let’s talk about MiniMax. The official positioning of MiniMax-M2.5 is actually quite high. In press releases and open platform documentation, they push it towards scenarios like programming, tool calling, search, and office productivity, even emphasizing speed and cost advantages. I don’t completely disbelieve these claims, but I prefer to break them down. For me, what MiniMax is genuinely good at isn’t “the most complex development tasks,” but rather the following:

Data Cleaning

A lot of data cleaning is essentially manual labor involving semi-structured text.

Name unification
Field mapping
Anomaly labeling
Classification tagging
Table field completion What these types of tasks fear most is not the model being “dumb,” but rather inconsistent formatting or divergent outputs. As long as the model can reliably output results in JSON, tables, or fixed templates, it’s actually sufficient. While powerful models certainly can do this, using the most expensive tier of model just to clean fields is often not cost-effective.

Documentation Writing

Writing documentation is annoying, not difficult. When an interface changes, a process changes, or a field is modified, the documentation has to change accordingly. This process doesn’t actually require the model to have strong creativity; rather, it requires it not to over-exert itself and alter clearly defined things into something ambiguous. MiniMax is often more reliable for these kinds of tasks than one might expect. Especially when you have already prepared the context, it acts more like a capable documentation assistant rather than an actual engineer.

Solution Material Search

The official platform is also promoting search and tool calling, so this direction is fine. Many times, what we need is not for the model to “come up with an answer out of thin air,” but rather for it to first find relevant web pages, documents, announcements, or materials, and then organize them neatly. In this scenario, cheaper models like MiniMax are very valuable because searching, summarizing, and integrating are inherently high-frequency, mundane tasks. So my actual view is: MiniMax isn’t incapable; rather, it is better suited for the dirty, tiring, and repetitive tasks within a production pipeline. If you let it act as an assistant or general laborer, it is often competent; but if you ask it to handle the entire engineering process, the probability of disappointment increases.

Local 12B Models, Best Suited for Bringing Back These Tasks

Looking further down, the logic for local deployment is actually the same. When many people talk about local models, they inevitably ask one question: Can it replace the flagship cloud models? I think this question is flawed from the start. For local models around 12B, what has real practical value isn’t “proving that it can handle the most powerful tasks,” but rather bringing back those stable, repetitive, sensitive, low-profit, yet high-frequency tasks.

Translation

This is one of the most natural scenarios for local models. As explicitly mentioned in the official blog of Qwen2.5, it has enhanced capabilities for long-text generation, structured data understanding, and JSON output, and supports over 29 languages. This combination is inherently suitable for tasks like translation, bilingual rewriting, format standardization, and terminology normalization. Technical documentation, field descriptions, product introductions, and API comments—these items often have stable structures and fixed terminology. While local models might not produce the most elegant translations, they are usually sufficient.

Data Cleaning

This is also where local models are particularly realistic. Many spreadsheets, documents, and business materials that you might not want to upload to the cloud. Especially internal data, customer records, meeting minutes, and draft proposals—when privacy and permissions are involved, running it locally provides much more peace of mind. At this point, the significance of a local model around 12B isn’t “how smart it is,” but rather that “it’s on my machine, and it can reliably handle these dirty tasks.”

Fixed Format Rewriting

For example:

Meeting minutes organized into a fixed template
Product titles cleaned into a unified naming convention
Bug descriptions rewritten into ticket format
Mixed Chinese and English text cleaned into single-language versions

These types of tasks share consistent characteristics: clear rules, large batches, high repetition, low value per instance, but significant cumulative effort. This is exactly what local models are best suited for.

Can the 3060 12GB actually run a model around 12B?

I prefer to write about this realistically: “It can run it, but don’t get your hopes up too high.” Google provided a very useful VRAM table in the official documentation for Gemma 3. The Gemma 3 12B roughly requires:

About 20 GB of VRAM to load the full precision version.
About 12.2 GB to load the medium quantization version.
About 8.7 GB to load a lower VRAM consumption version. The official documentation also specifically reminds that this is only for model loading, and does not include prompt or runtime overhead. This sentence is very key. What does it mean? It means that running a model around 12B on a card like the 3060 12GB is not impossible, but the prerequisites are usually:
You are running a quantized version.
The context length should not be too long.
The task shouldn’t be too complex.
You accept average, or even slow, speed. If you are willing to accept these premises, then running a local 12B model is indeed feasible. Tasks like translation, summarization, table cleaning, and fixed format conversion are not exaggerated in this regard. Furthermore, the official repository for Qwen2.5-14B-Instruct-GGUF itself provides multiple quantization formats, which actually makes the intention very clear: models in this category are inherently adapted for the local inference ecosystem. So my conclusion has never been that “the 3060 12GB can easily handle a 12B model,” but rather: It can run these types of models, but it is better suited for work with low expectations, high repetition, and high privacy requirements.

Cheap Models and Local Models: It’s Not Just About Saving API Costs

When people talk about this, the first reaction is always saving money. Of course, saving money is important. But I think the greater value is that you start daring to outsource all those little tasks you used to avoid doing. Before, you might not have written a dedicated script just to clean up a few hundred data points. You also wouldn’t manually adjust dozens of pages of mixed Chinese and English documents to achieve uniform formatting. And you certainly wouldn’t read through every single webpage to gather materials for an ad-hoc proposal. Things are different now. As long as the cost is low enough and the barrier is low enough, these tasks that were previously considered “not worth the effort” suddenly become worthwhile. You no longer hesitate over whether or not to do it; instead, you just throw it to a cheap model or a local model to run through first. This is what I see as the most realistic change. Powerful models are responsible for tackling core problems, weaker models handle miscellaneous tasks, and local models provide fallback and batch processing. With this division of labor, the entire workflow becomes smooth.

Conclusion

So, the final word remains: don’t always try to make one model conquer everything. Models like MiniMax are weak in capability, but they aren’t useless. If you use them to tackle complex engineering tasks, vague requirements, or multi-turn reasoning, you will naturally be disappointed; however, if you use them for data cleaning, document drafting, or searching for proposal materials, they often work quite smoothly. The same applies to local models around 12B. Their purpose isn’t to prove that “I no longer need cloud flagships,” but rather to reliably move stable, repetitive, sensitive, and high-volume tasks back onto their own machines. Simply put: don’t let a weak model do what it is not good at. Place them in the right role, and they will have real value.

References

Writing Notes

Original Prompt

Minimax’s large model is weak in capability, but it’s fine for tasks like data cleaning, document writing, and searching for proposal materials; with the same logic, deploying a large model locally for translation or data cleaning work is also good. The model parameter count is around 12b, and even a local GPU like the RTX 3060 with 12GB can handle it.

Writing Outline Summary

Retained the core judgment of “don’t force weak models onto hard tasks,” and did not write it as a model leaderboard comparison.
The MiniMax section is mainly based on the official positioning for programming, searching, and office work, then applies this judgment back to real-world tasks like data cleaning, document handling, and information retrieval.
For local models, I selected two officially sourced options: Qwen2.5 and Gemma 3, one supporting multilingual and structured output, and the other supporting 12B size and VRAM usage.
The description for the 3060 12GB was intentionally phrased as “capable, but don’t get too carried away,” to avoid presenting quantized inference as an absolute conclusion.
In the conclusion, I re-categorized strong models, weak models, and local models based on their respective roles, making the main thread more focused.