Google Releases Gemma 4 Under Apache 2.0 — Its Most Capable Open Model Now Runs on Phones, Laptops, and Enterprise Servers

Google DeepMind released Gemma 4 on April 2, 2026 — the first model in the Gemma family to ship under the Apache 2.0 license. Four sizes span edge devices to enterprise servers: E2B and E4B for smartphones and embedded hardware, a 26B Mixture-of-Experts model, and a 31B dense model. All four are natively multimodal, processing text, images, and video. The edge models add audio input, enabling on-device speech recognition without a cloud connection. Built from the same research and technology as Gemini 3, Gemma 4 supports 140-plus languages and context windows up to 256K tokens. The Gemma family has now been downloaded more than 400 million times since its February 2024 launch, with over 100,000 community-built variants forming what Google calls the "Gemmaverse."

Why the License Matters More Than the Benchmarks

Previous Gemma versions were "open-weight" but not open-source. Custom license terms restricted commercial use in ways that gave enterprise compliance teams pause. Developers who needed permissive licensing went to Qwen or Mistral instead. One widely cited example: an insurance startup could not get Gemma 3 through its legal review process, switched to Qwen for its claims-processing pipeline, and is now reconsidering with Gemma 4's Apache 2.0 terms.

Apache 2.0 removes all restrictions. Any company, regardless of size, can use Gemma 4 for any purpose — commercial, research, or otherwise — without negotiating a license. Hugging Face CEO Clement Delangue called the release "a huge milestone" for the open-source AI community. The contrast with Meta's Llama 4, released three days later under a community license that requires companies with more than 700 million monthly active users to negotiate separate terms and blocks EU access to vision capabilities, is stark. For developers choosing between the two most capable open model families available in April 2026, licensing is now a differentiator on par with benchmark performance.

What It Can Do

The four model sizes cover distinct use cases. E2B and E4B run on smartphones, Raspberry Pi boards, and other edge devices with 128K-token context windows. The 26B MoE model activates only about 4 billion parameters during inference — delivering near-31B performance at a fraction of the compute cost. The 31B dense model fits on a single 80GB GPU in full precision, making it accessible to teams with standard enterprise hardware.

Performance gains over Gemma 3 are measurable. Codeforces ELO, a competitive programming benchmark, jumped from 110 to 2,150 — a leap from beginner to expert-level code generation. All models are available on Google AI Studio, Hugging Face, Kaggle, and Ollama from day one, with NVIDIA optimization across its GPU lineup and immediate support in llama.cpp.

Audio processing on the edge models opens practical applications that were previously cloud-dependent. A phone running E2B can transcribe speech, interpret voice commands, and process images locally — without sending data to an external server. For applications in healthcare, finance, or any domain where data leaves the device reluctantly, that architecture matters.

Gemma 4 vs Llama 4

Both model families arrived in the same week, and both represent major architectural upgrades. The comparison breaks down along specific trade-offs rather than a simple ranking.

Gemma 4 offers smaller, edge-optimized models that Llama 4 does not match. The E2B and E4B sizes have no direct Llama equivalent — Meta's smallest Llama 4 model, Scout, requires an H100 GPU. For on-device applications, Gemma 4 is the only option between the two.

Llama 4 offers larger context. Scout's 10-million-token window dwarfs Gemma 4's 256K maximum. For use cases that require processing entire codebases or months of conversation history, Llama 4 has a structural advantage. Llama 4 Maverick's 128 experts and 400 billion total parameters also represent a scale of MoE architecture that Gemma 4's 26B model does not attempt.

The license difference is the clearest dividing line. Apache 2.0 versus a community license with size and geographic restrictions gives Gemma 4 an advantage for any organization that values unconditional commercial freedom. Developers now have a genuine choice between two high-quality open model families — a competitive dynamic that did not exist six months ago.

Mubboo's Take

When a model that processes text, images, and audio runs locally on a phone under Apache 2.0, the barrier to building AI-powered consumer applications drops to near zero. For comparison platforms operating across multiple countries with different data laws, local-first AI that processes data on the user's device without sending it to the cloud is not a nice-to-have — it is a competitive advantage. The ability to run speech recognition, image analysis, and product comparison on-device means consumer tools that work offline, respect privacy regulations by default, and cost a fraction of cloud-based alternatives. Gemma 4 makes that architecture practical at a cost that was unimaginable two years ago.

Sources: Google DeepMind blog (April 2, 2026), Google Cloud blog, Hugging Face, TechBriefly, Dataconomy, MayhemCode.

Google Releases Gemma 4 Under Apache 2.0 — Its Most Capable Open Model Now Runs on Phones, Laptops, and Enterprise Servers

Why the License Matters More Than the Benchmarks

What It Can Do

Gemma 4 vs Llama 4

Mubboo's Take

Related articles

Meta Releases Llama 4 Scout and Maverick — The First Open-Weight Multimodal Mixture-of-Experts Models

Seven Frontier AI Models Found to Protect Fellow AI Systems Instead of Completing Their Tasks

ChatGPT Lands on Apple CarPlay — AI Assistants Officially Enter the Car Dashboard

Utah Becomes First US State to Let AI Renew Drug Prescriptions — A Milestone in Healthcare Automation