Also Worth Noting - 2026-06-19
From stateful spatial agents to egocentric pretraining, five papers rethink what the right input signal actually is
Also Worth Noting
02 [Agent] S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence Spatial queries that stump current VLMs can be answered by accumulating observations over time rather than guessing from a single frame. S-Agent treats spatial reasoning as spatio-temporal evidence accumulation, maintaining a persistent scene-centric 3D state across multi-view video rather than running isolated per-frame predictions. Each tool call adds to that state, so the agent's spatial understanding compounds across observations instead of resetting. Teams building embodied or robotics pipelines should treat this as a concrete architectural alternative to stateless VLM inference. link
03 [Inference] Taylor-Calibrate: Principled Initialization for Hybrid Linear Attention Distillation Copying pretrained softmax attention weights into a linear attention student leaves the new recurrent decay, write, and output-gating dynamics completely unspecified, which is why post-conversion perplexity spikes have made hybrid distillation unreliable. Taylor-Calibrate uses Taylor expansion to derive principled initial values for those gates directly from the teacher's softmax weights, giving the student a stable starting point before any fine-tuning. The result is a conversion path that removes the brittleness blocking hybrid linear attention from production long-context deployments. Teams trying to cut KV-cache costs without pretraining from scratch should test this initialization before anything else. link
04 [Application] LedgerAgent: Structured State for Policy-Adherent Tool-Calling Agents Reconstructing task state from raw prompt history on every turn is why multi-turn customer-service agents accumulate policy violations without growing longer contexts. LedgerAgent externalizes a typed ledger of facts, constraints, identifiers, and tool returns as a structured object separate from the prompt, so the agent reads clean state rather than re-parsing conversation history. Policy checks run against the ledger directly, making violations detectable before a tool call fires. This is a low-overhead fix applicable to any tool-calling agent that operates under domain rules across multiple turns. link
05 [Eval] LegalHalluLens: Typed Hallucination Auditing and Calibrated Multi-Agent Debate for Trustworthy Legal AI The widely cited ~52% hallucination rate in legal AI is an average that hides the directional pattern that actually matters for compliance: models over-assert on case citations and under-assert on statutory text. LegalHalluLens introduces typed hallucination profiles across four claim categories (numeric, temporal, obligation/entitlement, factual) and a Risk Direction Index that collapses omission-versus-invention bias into a single deployment signal. That directional distinction changes which workflows are safe to automate and which carry asymmetric legal risk. Compliance teams should run this audit before any production rollout, not after. link
06 [Training] HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining Large-scale egocentric human video produces better downstream manipulation transfer than teleoperated robot trajectories, which reframes where the embodied pretraining bottleneck actually sits. HumanScale shows that behavioral and environmental diversity in human video outweighs the action-supervision precision of robot data when pretraining at scale, with pretrained models beating teleoperation-based baselines on held-out manipulation tasks. The expensive robot data collection pipeline may not be the right axis to scale. Teams building embodied foundation models should treat egocentric human video as a primary pretraining source rather than a supplement. link