Search

Edoardo M. Ponti

Publications
FAQ
Resume
AToM ⚛︎
TEAS ☕︎

Piotr Nawrot

Latest

Fast and Expressive Multi-Token Prediction with Probabilistic Circuits
Inference-Time Hyper-Scaling with KV Cache Compression
The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
Efficient Transformers with Dynamic Token Pooling

LICENSE: CC-BY-SA

Edoardo M. Ponti, 2026 · Adapted from Alison Presmanes Hill's , the blogdown package, and the Academic theme for Hugo.

Cite