<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Apache on Uncle Xiang&#39;s Notebook</title>
        <link>https://ttf248.life/en/tags/apache/</link>
        <description>Recent content in Apache on Uncle Xiang&#39;s Notebook</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en</language>
        <lastBuildDate>Thu, 09 Apr 2026 15:45:31 +0800</lastBuildDate><atom:link href="https://ttf248.life/en/tags/apache/index.xml" rel="self" type="application/rss+xml" /><item>
        <title>Google has released Gemma 4 this time (Part 1)</title>
        <link>https://ttf248.life/en/p/gemma-4-series-models-and-license/</link>
        <pubDate>Wed, 08 Apr 2026 23:48:20 +0800</pubDate>
        
        <guid>https://ttf248.life/en/p/gemma-4-series-models-and-license/</guid>
        <description>&lt;p&gt;On the day of the initial release, what I originally wanted to do was simple: find an upgraded version corresponding to &lt;code&gt;Gemma 3&lt;/code&gt; and download it to run.
However, after looking around, I was a bit stunned. The familiar naming convention of &lt;code&gt;4B / 12B / 27B&lt;/code&gt; is gone; instead, we have &lt;code&gt;E4B&lt;/code&gt;, &lt;code&gt;26B A4B&lt;/code&gt;, and &lt;code&gt;31B&lt;/code&gt;. How should I put it? This time, what Google truly changed wasn&amp;rsquo;t just the model sizes, but even &amp;ldquo;how you should understand this batch of models.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve broken down these articles into three parts. This current article only clarifies the release information, model names, and protocols; the next one will cover &lt;a class=&#34;link&#34; href=&#34;https://ttf248.life/en/p/gemma-4-series-local-test-on-rtx-3060/&#34; &gt;Google Released Gemma 4 (Part II): Running Locally on a 3060 12GB, 26B A4B is More Realistic&lt;/a&gt;; and the last one will conclude with &lt;a class=&#34;link&#34; href=&#34;https://ttf248.life/en/p/gemma-4-series-vram-cliff-and-mac-unified-memory/&#34; &gt;Google Released Gemma 4 (Part III): Why VRAM Insufficiency Causes a Cliff, and Why Mac Can Be a Fallback But Isn&amp;rsquo;t Fast&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;lets-first-clarify-what-was-actually-released-this-time&#34;&gt;Let&amp;rsquo;s first clarify what was actually released this time
&lt;/h2&gt;&lt;p&gt;Last year, &lt;code&gt;Gemma 3&lt;/code&gt; was released on March 12, 2025, and this &lt;code&gt;Gemma 4&lt;/code&gt; was released on April 2, 2026. It is indeed about a year apart.
However, we cannot approach this by asking, &amp;ldquo;Who is the next generation after 27B.&amp;rdquo; The four main sizes provided by the official source are no longer simply categorized by total parameters.
| &lt;code&gt;E2B&lt;/code&gt; | Dense | 2.3B effective, 5.1B including embeddings, 128K context | On-device, ultra-lightweight local |&lt;/p&gt;
&lt;h2 id=&#34;clarify-what-was-actually-released-this-time&#34;&gt;Clarify what was actually released this time
&lt;/h2&gt;&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Model&lt;/th&gt;
          &lt;th&gt;Structure&lt;/th&gt;
          &lt;th&gt;Key Numbers&lt;/th&gt;
          &lt;th&gt;Typical Scenarios&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;E4B&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Dense&lt;/td&gt;
          &lt;td&gt;4.5B effective, 8B including embeddings, 128K context&lt;/td&gt;
          &lt;td&gt;The original 4B small model main line&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;clarify-what-was-actually-released-this-time-1&#34;&gt;Clarify what was actually released this time
&lt;/h2&gt;&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Model&lt;/th&gt;
          &lt;th&gt;Structure&lt;/th&gt;
          &lt;th&gt;Key Numbers&lt;/th&gt;
          &lt;th&gt;Typical Scenarios&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;26B A4B&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;MoE&lt;/td&gt;
          &lt;td&gt;25.2B total, approx. 3.8B active, 256K context&lt;/td&gt;
          &lt;td&gt;Consumer GPUs, local deployment, balancing quality and speed&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;clarify-what-was-actually-released-this-time-2&#34;&gt;Clarify what was actually released this time
&lt;/h2&gt;&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Model Size&lt;/th&gt;
          &lt;th&gt;Structure&lt;/th&gt;
          &lt;th&gt;Key Numbers&lt;/th&gt;
          &lt;th&gt;Typical Scenarios&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;31B&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Dense&lt;/td&gt;
          &lt;td&gt;30.7B dense, 256K context&lt;/td&gt;
          &lt;td&gt;Aiming for the upper limit, leaderboards, and more stable quality&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;lets-clarify-what-was-actually-released-this-time&#34;&gt;Let&amp;rsquo;s clarify what was actually released this time
&lt;/h2&gt;&lt;p&gt;If you only look at the surface, you might feel that the naming is more confusing. But it&amp;rsquo;s not random; Google is deliberately splitting the three tracks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Small model for on-device use, given to &lt;code&gt;E2B / E4B&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Local player track, given to &lt;code&gt;26B A4B&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Quality and upper limit track, given to &lt;code&gt;31B&lt;/code&gt;
This is also why many people&amp;rsquo;s first impression might be, &amp;ldquo;The previously familiar upgrade path has been broken.&amp;rdquo; It&amp;rsquo;s not that they didn&amp;rsquo;t release an upgraded version; it&amp;rsquo;s that Google doesn&amp;rsquo;t want to sell products based on just one dimension: total parameters.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;e-and-a-are-not-decorative-letters-this-time&#34;&gt;&amp;lsquo;E&amp;rsquo; and &amp;lsquo;A&amp;rsquo; are not decorative letters this time
&lt;/h2&gt;&lt;p&gt;In this batch of names, the most confusing ones are &lt;code&gt;E4B&lt;/code&gt; and &lt;code&gt;A4B&lt;/code&gt;.
The &amp;lsquo;E&amp;rsquo; in &lt;code&gt;E2B&lt;/code&gt; and &lt;code&gt;E4B&lt;/code&gt; stands for &lt;code&gt;effective parameters&lt;/code&gt;, according to the official documentation. Because these two models use &lt;code&gt;Per-Layer Embeddings&lt;/code&gt;, the total parameter count and the actual effective parameter count are not measured by the same metric. Simply put, Google is reminding you that this is not like the old &amp;ldquo;a simple 4B dense model.&amp;rdquo;
The &amp;lsquo;A&amp;rsquo; in &lt;code&gt;26B A4B&lt;/code&gt; stands for &lt;code&gt;active parameters&lt;/code&gt;. The total size is &lt;code&gt;25.2B&lt;/code&gt;, but only about &lt;code&gt;3.8B&lt;/code&gt; are actually activated per token. This is key to the MoE approach: the total model size is large, but the part that actually participates in computation at runtime is much smaller.
So, even though both names seem to have a &amp;lsquo;4B&amp;rsquo;, their meanings are completely different:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;E4B&lt;/code&gt; is for the small model line.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;26B A4B&lt;/code&gt; is a large MoE with &amp;ldquo;activation scale of around 4B&amp;rdquo; during local inference.
This naming convention was indeed awkward at first, but it is closer to the actual deployment experience than before.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;if-you-previously-used-gemma-3-how-to-find-the-corresponding-relationship-this-time&#34;&gt;If you previously used Gemma 3, how to find the corresponding relationship this time
&lt;/h2&gt;&lt;p&gt;I think the easiest place to misjudge with this generation is to treat it as a linear upgrade from &lt;code&gt;Gemma 3&lt;/code&gt;.
If you look at it based on usage habits, you can roughly understand it like this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Those who used to focus on &lt;code&gt;4B&lt;/code&gt; for light tasks should now first look at &lt;code&gt;E4B&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Those who used to focus on &lt;code&gt;27B&lt;/code&gt; to see the model&amp;rsquo;s upper limit should now look at &lt;code&gt;31B&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;If you previously wanted to find a balance point on consumer-grade GPUs that is &amp;ldquo;powerful enough but not completely unrunnable,&amp;rdquo; now focus on &lt;code&gt;26B A4B&lt;/code&gt;
If you don&amp;rsquo;t clarify this layer first, local deployment will easily go wrong later. You might complain, &amp;ldquo;Why isn&amp;rsquo;t there the familiar upgraded version?&amp;rdquo; while mistakenly choosing a model that isn&amp;rsquo;t actually suitable for you.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;the-most-valuable-update-this-time-isnt-the-parameters&#34;&gt;The most valuable update this time isn&amp;rsquo;t the parameters
&lt;/h2&gt;&lt;p&gt;What really made me feel like this release was a &amp;ldquo;finally figured it out&amp;rdquo; moment wasn&amp;rsquo;t the leaderboard, but the license.
The old &lt;code&gt;Gemma&lt;/code&gt; terms weren&amp;rsquo;t unusable, but they always felt a bit awkward. Especially if you care about these things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Redistribution&lt;/li&gt;
&lt;li&gt;Distillation or secondary packaging&lt;/li&gt;
&lt;li&gt;Integrating the model into your own product pipeline&lt;/li&gt;
&lt;li&gt;Commercial deployment
You always have to go back and look at how those notices, downstream restrictions, and accompanying agreements in the terms should be handled.
By changing directly to &lt;code&gt;Apache 2.0&lt;/code&gt; this time, &lt;code&gt;Gemma 4&lt;/code&gt; made things much cleaner. The core message is very clear:&lt;/li&gt;
&lt;li&gt;Commercially usable&lt;/li&gt;
&lt;li&gt;Modifiable&lt;/li&gt;
&lt;li&gt;Redistributable
The main obligations are limited to retaining familiar open-source elements like the license, notices, and modification documentation.
Simply put, Google didn&amp;rsquo;t just open-source a model this time; they smoothed out the entire process of &amp;ldquo;whether or not people feel safe using it.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;initial-community-feedback-basically-two-lines&#34;&gt;Initial Community Feedback, Basically Two Lines
&lt;/h2&gt;&lt;p&gt;If you only look at the first week&amp;rsquo;s buzz, there are roughly two main sentiments.&lt;/p&gt;
&lt;p&gt;The first line is that &lt;code&gt;31B&lt;/code&gt; is genuinely capable. The official benchmarks are already very impressive. In the &lt;code&gt;Arena AI&lt;/code&gt; text leaderboard, 31B was ranked among the top open-source models upon release, and it also showed a significant improvement over &lt;code&gt;Gemma 3 27B&lt;/code&gt; on &lt;code&gt;LiveCodeBench v6&lt;/code&gt;. Many people&amp;rsquo;s first reaction is that achieving this level of performance with this size is quite beyond expectations.&lt;/p&gt;
&lt;p&gt;The second line is that &lt;code&gt;26B A4B&lt;/code&gt; seems like a lifeline for local users. It might not be the flashiest flagship model at first glance, but it is very practical. Especially if you aren&amp;rsquo;t running things in a data center, but rather on consumer-grade GPUs, workstations, or even older machines, the local experience tends to fall onto this line.&lt;/p&gt;
&lt;p&gt;Of course, there&amp;rsquo;s a very realistic prerequisite for the initial wave of feedback: the ecosystem is still catching up with the versions. Templates, quantization methods, inference frameworks, and front-end tools—many haven&amp;rsquo;t fully kept pace yet. Therefore, when looking at comments right now, it&amp;rsquo;s best to view them in two layers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The core model:&lt;/strong&gt; There has indeed been a big improvement here.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Local experience:&lt;/strong&gt; This will continue to be influenced by the maturity of the toolchain.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;my-conclusion-on-the-first-article&#34;&gt;My Conclusion on the First Article
&lt;/h2&gt;&lt;p&gt;If you just want to know what Google actually released this time, one sentence is enough.
&lt;code&gt;Gemma 4&lt;/code&gt; is no longer following the old idea of &amp;ldquo;a line of dense models from small to large,&amp;rdquo; but rather separating three paths: device-side deployment, local deployment, and quality ceiling. The names like &lt;code&gt;E4B&lt;/code&gt;, &lt;code&gt;26B A4B&lt;/code&gt;, and &lt;code&gt;31B&lt;/code&gt; sound strange, but behind them is a very practical division of labor for deployment.
But if you ask me what the biggest change is this time, I still stick to that judgment:
It&amp;rsquo;s not about parameters, nor is it about leaderboards; it&amp;rsquo;s that Google finally put &lt;code&gt;Gemma 4&lt;/code&gt; into an open-source protocol that everyone feels more comfortable actually using.
This step is more important than the numbers on the surface.
In the next article, I won&amp;rsquo;t continue talking about the conference narrative; I&amp;rsquo;ll go straight back to local machines. Still with that unupgraded &lt;code&gt;RTX 3060 12GB&lt;/code&gt;, why was it that my initial focus wasn&amp;rsquo;t on &lt;code&gt;31B&lt;/code&gt;, but on &lt;code&gt;26B A4B&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;references&#34;&gt;References
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Gemma 4: Byte for byte, the most capable open models&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://ai.google.dev/gemma/docs/core/model_card_4&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Gemma 4 model card&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://ai.google.dev/gemma/terms&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Gemma Terms of Use&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://ai.google.dev/gemma/apache_2&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Apache License 2.0 for Gemma 4&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://foodtruckbench.com/blog/gemma-4-31b&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Gemma 4 31B on FoodTruck Bench&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://www.reddit.com/r/LocalLLaMA/comments/1san4kd/will_gemma_4_124b_moe_open_as_well/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;LocalLLaMA discussion on Gemma 4 license changes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://developers.googleblog.com/introducing-gemma3/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Gemma 3: The Developer Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;writing-notes&#34;&gt;Writing Notes
&lt;/h2&gt;&lt;h3 id=&#34;original-prompt&#34;&gt;Original Prompt
&lt;/h3&gt;&lt;pre&gt;&lt;code class=&#34;language-text&#34;&gt;$blog-writer Google has released the Gemma4 model after a year. As usual, I&#39;m trying to deploy it locally on that old desktop with an unupgraded NVIDIA 3060 12GB graphics card. This time I caught the initial release, but I couldn&#39;t find an upgraded version of the previously used Gemma3. However, there is a similar version called GemmaE4b. Please first search and introduce all the models released this time, what the abbreviation letters mean in them, and then search for online reviews about Gemma4. The key point is that Google updated the model&#39;s protocol this time, so the restrictions for users are fewer. The biggest surprise: my usual test question—write a piece of C++ code to output a five-pointed star in the console. Last year&#39;s smaller parameter open-source models couldn&#39;t handle this problem, but Google managed it this time. In the first version, it gave an answer that completely exceeded my expectations; it knew about my trap. Outputting a five-pointed star to the console is very tricky, so it directly hardcoded a string for the five-pointed star, and the console outputted it directly. This is the original text: Because drawing a five-pointed star with precise geometric structure using mathematical logic in a pure text console (Console) is very complex (involving coordinate system transformation and pixel filling), the most classic and visually best method is to use ASCII Art. After I forced it to perform calculations, it also managed it through mathematical calculation and successfully drew the five-pointed star. Previously, I often used Gemma4 for local translation tasks; many multilingual versions of historical articles on current blogs are like this. The model used for local testing: gemma-4-26b-a4b. The 31b version is indeed too slow. But looking at the reviews, the 31b performs very well, and its ranking scores are excellent. Also, while browsing forums, I realized that if the VRAM is insufficient and the model parameters increase, the token generation speed will drop drastically. Can you explain why? Macs don&#39;t have this problem because they use unified memory; please explain the technical reason. Furthermore, if speed is required, then an NVIDIA card with large VRAM is still necessary. The Mac solution can serve as a fallback, but it cannot match the speed. This content is quite extensive; please evaluate whether it should be split into a series of articles.
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&#34;writing-outline-summary&#34;&gt;Writing Outline Summary
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;The first article will only focus on clarifying &amp;ldquo;what was actually released this time&amp;rdquo; and &amp;ldquo;why the protocol is important,&amp;rdquo; avoiding topics that compete with local experience discussions.&lt;/li&gt;
&lt;li&gt;We will separate the model roadmap breakdown and then explain the meaning of the letters, making the logical flow more direct than the previous version.&lt;/li&gt;
&lt;li&gt;For the protocol section, we retained the judgment: &amp;ldquo;What was truly released this time is not the parameters, but the usage restrictions.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Community feedback will only be used for synthesis/conclusion, without preemptively including too many local experience details.&lt;/li&gt;
&lt;/ul&gt;</description>
        </item>
        
    </channel>
</rss>
