Also Worth Noting - 2026-05-03
gradient-free training, sparse attention, alignment midtraining, prompt reliability audits, and geometry-preserving fine-tuning
Also Worth Noting
02 [Training] Training Non-Differentiable Networks via Optimal Transport Surrogate gradients for quantized layers and spiking neurons introduce bias by design. PolyStep sidesteps this entirely by using only forward passes: it evaluates loss at structured polytope vertices in a compressed subspace, then displaces parameters toward low-cost vertices via barycentric projection weighted by a softmax over the resulting cost matrix. The update corresponds to a one-sided optimal transport step, no gradient signal required. Teams training quantized or spiking architectures where straight-through estimators produce unstable results have a concrete, bias-free alternative to reach for. link
03 [Inference] Stochastic Sparse Attention for Memory-Bound Inference Long-context autoregressive decoding is bottlenecked by KV cache bandwidth, not compute, and SANTA targets that bottleneck directly. Instead of reading all n_k value rows, it samples S indices from the post-softmax attention distribution and aggregates only those value rows via gather-and-add, replacing the value-stage multiply-accumulates entirely. The estimator is unbiased, and stratified sampling variants reduce variance while staying GPU-friendly. For teams serving long-context models, this is a bandwidth reduction that does not trade quality for speed. link
04 [Training] Model Spec Midtraining: Improving How Alignment Training Generalizes Alignment fine-tuning on demonstrations underspecifies the generalization target because demonstrations only show what to do, not why. Model spec midtraining inserts a phase between pretraining and RLHF where the model trains on synthetic documents that discuss the spec itself, teaching the content of intended behavior before any demonstration data shapes the weights. The result is alignment that transfers to out-of-distribution scenarios where demonstration data gives no signal at all. Teams building alignment pipelines should treat spec content as a first-class training input, not just an evaluation rubric. link
05 [Eval] What Single-Prompt Accuracy Misses: A Multi-Variant Reliability Audit of Language Models A single prompt per benchmark can flip which model wins. Across 10 instruct models, five classification and reasoning benchmarks, and five prompt variants each, accuracy scores diverge sharply from token-probability calibration, verbal-confidence calibration, and prompt-perturbation spread, meaning leaderboard rankings built on one prompt measure something different from deployed reliability. Switching a single prompt formulation for one model in the study materially changed the ranking conclusion. Any evaluation pipeline that reports one accuracy number per benchmark is reporting a best-case snapshot, not a reliability profile. link
06 [Agent] RefusalGuard: Geometry-Preserving Fine-Tuning for Safety in LLMs Standard fine-tuning degrades refusal behavior not by overwriting safety rules but by collapsing the structured activation-space geometry that encodes them. RefusalGuard identifies this geometric drift as the mechanism behind alignment degradation and adds an explicit geometry-preservation objective during downstream fine-tuning, recovering refusal behavior without sacrificing task performance. The finding reframes the alignment-tax problem: it is not a conflict between safety and capability objectives, but a consequence of unconstrained representational drift. Teams fine-tuning safety-aligned models for production tasks should treat activation geometry as a quantity worth monitoring and preserving. link