Also Worth Noting — 2026-03-31

KVSculpt compresses language model memory to enable faster, cheaper processing of longer texts.

Also Worth Noting

02 [Efficiency] KVSculpt: Distilling KV Cache for Efficient LLM Inference KVSculpt introduces a novel method to compress the KV cache, which stores past information for large language models. It achieves this compression through a "distillation" process that significantly reduces the cache's size without meaningfully hurting the model's performance. This allows large language models to process much longer texts more efficiently, making them faster and cheaper to run for complex applications. link

03 [Robotics] SkyNet: MuZero for Uncertain Multi-Player Games SkyNet introduces an AI planning method for agents operating in uncertain, multi-player games where crucial information is hidden. This is challenging because agents must infer hidden information and predict other players' moves, unlike the perfect-information settings where its predecessor, MuZero, typically excels. This breakthrough could enhance AI in real-world scenarios like robotics, competitive strategy games, or autonomous negotiation, where agents operate with incomplete knowledge. link

04 [Video Gen] Quantizing Memory for Longer AI Video Generation A comprehensive study evaluated 33 methods to make AI models generate much longer videos by reusing their own output. This process, called "self-forcing," causes a specialized memory component (the "KV cache") to grow excessively, making memory compression crucial. Efficiently managing this memory will enable AI to create significantly longer and higher-quality videos for entertainment, education, and beyond. link

05 [Evaluation] Binary Latent Protein Optimization Q-BIOLAT maps complex protein designs into compact binary codes to optimize their fitness. This method directly solves protein optimization as a discrete combinatorial problem, which existing learning models struggle with due to their continuous representations. Efficiently finding optimal protein structures could accelerate the development of new drugs or industrial enzymes. link

06 [Efficiency] RSR-core: Faster Low-Bit Matrix-Vector Multiplication RSR-core is a new engine designed to significantly speed up how computers perform matrix-vector multiplication, especially with highly compressed (low-bit) data. This engine uses specialized techniques to handle 1-bit (binary) and 1.58-bit (ternary) weights, achieving up to 2.3 times faster processing than current best methods. By making this fundamental operation faster, RSR-core enables more efficient and cheaper inference for large language models, neural networks, and vector databases. link