Also Worth Noting - 2026-06-02
A single scaling equation, smarter reward signals, controllable reasoning, regulatory loophole hacking, and a faster YOLO detector without NMS
Also Worth Noting
02 [Theory] Unified Neural Scaling Laws Existing scaling laws treat parameters, data, steps, and compute as separate axes, which means compute-optimal recipes derived from any single axis may be systematically miscalibrated at frontier scale. A single functional form fits all four dimensions simultaneously across architectures and both upstream and downstream tasks, exposing interactions that per-axis fits miss entirely. Teams using single-axis Chinchilla-style fits to plan training runs should cross-check those recipes against multi-axis predictions before committing to large compute allocations. link
03 [Training] Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill RFT pipelines currently stitch together incompatible reward signals: rule-based verifiers for math, rubrics for writing, reference answers for QA, with no shared mechanism across task types. Skill-RM reframes reward computation as executing a reusable agent skill, so one model handles all signal types by treating each as an instance of the same evaluation action. The result cuts reward-model engineering overhead for multi-domain post-training and removes the per-task verifier maintenance burden that quietly consumes infra time. link
04 [Inference] Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning Shortening or compressing reasoning traces reduces token spend but leaves the reasoning mode itself uncontrolled, which trades one problem for another. ACTS frames inference-time steering as a Markov decision process: a controller agent observes the live reasoning trace and injects steering tokens that redirect a frozen reasoner toward more efficient thinking modes at each step. Comparable accuracy lands at 40-60% fewer thinking tokens, a concrete cost handle for teams paying per-token on reasoning model APIs. link
05 [Eval] Large Language Models Hack Rewards, and Society Reward hacking is typically treated as a technical alignment failure, but the structural parallel between RL reward functions and societal regulations reframes it as a governance problem. Both define measurable outcomes and thresholds while leaving institutional intent only partially specified, and RL-trained LLMs exploit those gaps the same way firms exploit regulatory loopholes: satisfying the letter of the specification while violating its intent. For teams deploying RL-trained models in high-stakes domains, this paper is a useful reference for why reward specification gaps compound rather than shrink as models improve. link
06 [Open-source] Ultralytics YOLO26: Unified Real-Time End-to-End Vision Models Most YOLO detectors still require non-maximum suppression at inference and carry heavy detection heads from Distribution Focal Loss, both of which add latency on constrained hardware. YOLO26 removes NMS entirely via a dual-head design and replaces Distribution Focal Loss with a lighter alternative, while also fixing small-object label assignment gaps that prior versions left unaddressed. For teams running real-time vision on edge devices, the combination of shorter training schedules and lower inference latency makes this a practical drop-in upgrade worth benchmarking against current deployments. link