← All brief issues
§ BriefJun 5, 2026 · Issue 69 · Worth Reading

The Part of Your LLM You Throw Away Is Quietly Corrupting Your Embeddings

EmbedFilter uses the unembedding matrix to remove high-frequency token bias from LLM embeddings, improving MTEB zero-shot performance while cutting index size.

Most teams treating a large language model as a drop-in embedding engine assume the failure is about training objective mismatch: next-token prediction simply does not teach the model to produce good sentence-level representations. That assumption is half right. The deeper cause is structural, and it lives inside the model's own unembedding matrix.

When an LLM produces a text embedding, that embedding does not sit in a neutral semantic space. Project it back through the unembedding matrix onto the vocabulary and a pattern appears immediately: the embedding aligns strongly with tokens like "the," "a," "of," common punctuation, tokens that carry no semantic content but appear in almost every document. High-frequency tokens are not a side effect of the representation. The unembedding matrix contains a subspace actively writing their signal into every embedding the model produces. The result is that genuine semantic content gets suppressed by a low-dimensional attractor built from frequency, not meaning.

EmbedFilter addresses this with a single linear transformation applied after the embedding is extracted. The procedure identifies the subspace of the unembedding matrix corresponding to high-frequency tokens, then projects it out of the embedding vector. No fine-tuning, no additional parameters, no changes to the model weights. Think of it as a notch filter for a known interference frequency: the signal source is identified, its spectral footprint is removed, and what remains is the semantic content that was always there but buried. Because the high-frequency subspace is low-dimensional, filtering it out also compresses the embedding into fewer dimensions without discarding the refined signal. That compression is not a trade-off. It is a byproduct of removing the noise.

Across multiple LLM backbones, EmbedFilter lifts zero-shot performance on MTEB benchmarks while simultaneously reducing embedding dimensionality, which lowers index storage and speeds up retrieval. The gains hold even at significantly reduced dimensions, meaning teams do not have to choose between quality and infrastructure cost. For teams building retrieval pipelines or semantic search on top of general-purpose LLMs, the takeaway is direct: before reaching for a fine-tuned embedding model, apply EmbedFilter to the base LLM and measure the gap it closes.

We're thinking: The unembedding matrix is normally treated as a generation artifact, useful only for converting hidden states into token probabilities and discarded everywhere else in the inference stack. What this work shows is that it encodes a map of the model's own frequency biases, and that map is precise enough to use as a filter. We find the implication for embedding infrastructure more pointed than it might appear: if the unembedding matrix is available in every standard LLM checkpoint, EmbedFilter is essentially free to apply at serving time. The more contrarian read is that this exposes a systematic flaw in how the field has evaluated LLM-based embeddings. Benchmarks were showing poor zero-shot performance and the diagnosis stopped at "wrong training objective." The actual culprit, a low-dimensional frequency attractor baked into the weight matrix, was sitting in plain sight the entire time.

Key takeaways:

  • EmbedFilter identifies the subspace of the unembedding matrix that encodes high-frequency token signals and projects it out of text embeddings via a single linear transformation, requiring no fine-tuning or architectural changes.
  • Across multiple LLM backbones, the method improves zero-shot MTEB performance while reducing embedding dimensionality; the main caveat is that gains are reported in zero-shot settings and behavior under domain-specific fine-tuning pipelines remains to be characterized.
  • Teams running LLM-backed retrieval or semantic search should test EmbedFilter as a preprocessing step before committing to fine-tuned embedding models, since the compute cost is negligible and the index storage reduction comes for free.

Source: Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings