Edoardo M. Ponti
Publications
FAQ
Resume
AToM ⚛︎
TEAS ☕︎
Piotr Nawrot
Latest
Fast and Expressive Multi-Token Prediction with Probabilistic Circuits
Inference-Time Hyper-Scaling with KV Cache Compression
The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
Efficient Transformers with Dynamic Token Pooling
Cite
×