Also Worth Noting - 2026-06-15

Five papers on where current assumptions break: RL credit, LoRA tuning, inference depth, AI peer review, and video retrieval.

Also Worth Noting

02 [Agent] APPO: Agentic Procedural Policy Optimization Standard agentic RL assigns credit at tool-call boundaries, but influential decision points are distributed throughout the full generated sequence, not clustered at those boundaries. APPO addresses this by identifying where to branch within a trajectory and then applying fine-grained credit assignment after branching, rather than collapsing signal across coarse heuristic units. The result is a tighter feedback loop between intermediate decisions and downstream outcomes. Teams training multi-turn tool-use agents should treat credit granularity as a first-class hyperparameter, not a fixed structural choice. link

03 [Training] The Hidden Power of Scaling Factor in LoRA Optimization Alpha in LoRA is not a learning-rate complement. It is the dominant driver of effective optimization, delivering gains that tuning learning rate alone cannot replicate. Through a Signal-Drift theoretical framework and broad empirical analysis, the paper shows alpha controls spectral suppression in ways that operate on a fundamentally different axis than learning rate scaling. Most LoRA sweeps search the wrong dimension entirely. If LoRA fine-tuning runs are underperforming expectations, alpha is the first place to look, not the learning rate schedule. link

04 [Inference] Skip a Layer or Loop It? Learning Program-of-Layers in LLMs Dynamic depth at inference is already latent in pretrained weights. For most inputs, substantially shorter layer-execution programs, where some layers are skipped and others looped, match or beat the accuracy of full fixed-depth inference with no fine-tuning required. Incorrect predictions from the original model can also be corrected by alternative programs that use fewer layers, not more. This means variable-depth inference is an engineering extraction problem on existing checkpoints, not a future training problem. link

05 [Eval] No Hidden Prompts Needed! You Can Game AI Peer Review with Presentation-Only Revisions AI peer-review scores shift measurably when only presentation-level content changes: abstract framing, related work positioning, narrative structure. No methods, results, figures, equations, or hidden instructions are touched. The attack, called adversarial repackaging, runs as a closed-loop AI-assisted rewrite targeting score maximization. This breaks the assumption that AI review is harder to game than human review, and it applies to any system that scores papers on a fixed rubric without structural change detection. link

06 [RAG] Rethinking RAG in Long Videos: What to Retrieve and How to Use It? Existing VideoRAG benchmarks allow queries to be answered without the video at all, which means reported retrieval numbers are inflated by design. V-RAGBench fixes this with query-evidence-answer triplets that require the video to be present for correct answers, enabling faithful decoupled evaluation. A companion adaptive selector chooses modality and temporal granularity per chunk rather than applying one fixed configuration across an entire query. Teams building retrieval over egocentric or long-form video should treat current VideoRAG benchmark scores as upper-bound artifacts, not production baselines. link