A 26M Model That Does Tool Calling: Needle Shrinks Gemini's Capability to Edge Scale

Cactus Compute distills Gemini's tool-calling into a 26M param model, threatening the case for cloud APIs in constrained deployments.

9. A 26M Model That Does Tool Calling: Needle Shrinks Gemini's Capability to Edge Scale

Cactus Compute published Needle on GitHub this week: a 26-million-parameter model that distills Gemini's tool-calling behavior into a package small enough to run on edge hardware. The project surfaced on Hacker News with 585 upvotes, placing it among the more-watched open-source drops of the month. The core claim is that structured function-calling, long considered a capability requiring large-model scale, can be compressed into a model roughly 1,000x smaller than the frontier systems that pioneered it.

That compression matters most where it threatens Google's own position. Gemini's tool-calling API is a recurring revenue line for Google Cloud, and the argument for paying per-token has always rested on capability gaps that smaller models couldn't close. Needle chips at that argument directly. If a 26M model handles tool dispatch reliably, the case for routing function-calling workloads through a hosted API weakens, especially for developers building on-device assistants, IoT controllers, or latency-sensitive pipelines where a round-trip to a cloud endpoint is a design failure, not a tradeoff. The 585 upvotes are a practitioner signal, not a quality certificate, but they suggest the capability-to-size ratio is landing as credible, not theoretical.

The broader pattern is distillation closing the gap faster than most roadmaps assumed. Microsoft's Phi series, Mistral's small models, and now projects like Needle are each targeting a different frontier capability and asking how thin it can be sliced. Tool use was supposed to be one of the last holdouts. If Needle's benchmarks hold under production conditions, the next question is whether retrieval-augmented generation and multi-step planning compress the same way. Teams building edge agents should watch the Needle repo for evals against real tool schemas before drawing conclusions, but the direction is clear.

Source: Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model