Edoardo M. Ponti
Publications
FAQ
Resume
AToM ⚛︎
TEAS ☕︎
Publications
Type
Conference paper
Journal article
Date
2026
2025
2024
2023
AdaSplash-2: Faster Differentiable Sparse Attention
Sparse attention has been proposed as a way to alleviate the quadratic cost of transformers, a central bottleneck in long-context …
Nuno Gonçalves
,
Hugo Pitorro
,
Vlad Niculae
,
Edoardo M. Ponti
,
Lei Li
,
André F. T. Martins
,
Marcos V. Treviso
PDF
Code
Self-Improving World Modelling with Latent Actions
Internal modelling of the world — predicting transitions between previous states X and next states Y under actions Z — is essential to …
Yifu Qiu
,
Zheng Zhao
,
Waylon Li
,
Yftah Ziser
,
Anna Korhonen
,
Shay B. Cohen
,
Edoardo M. Ponti
PDF
Code
Bolmo: Byteifying the Next Generation of Language Models
Recent advances in generative AI have been largely driven by large language models (LLMs), deep neural networks that operate over …
Benjamin Minixhofer
,
Tyler Murray
,
Tomasz Limisiewicz
,
Anna Korhonen
,
Luke Zettlemoyer
,
Noah A. Smith
,
Edoardo M. Ponti
,
Luca Soldaini
,
Valentin Hofmann
PDF
Code
Project
Fast and Expressive Multi-Token Prediction with Probabilistic Circuits
Multi-token prediction (MTP) is a prominent strategy to significantly speed up generation in large language models (LLMs), including …
Andreas Grivas
,
Lorenzo Loconte
,
Emile van Krieken
,
Piotr Nawrot
,
Yu Zhao
,
Euan Wielewski
,
Pasquale Minervini
,
Edoardo M. Ponti
,
Antonio Vergari
PDF
Inference-Time Hyper-Scaling with KV Cache Compression
Inference-time scaling trades efficiency for increased reasoning accuracy by generating longer or more parallel sequences. However, in …
Adrian Łańcucki
,
Konrad Staniszewski
,
Piotr Nawrot
,
Edoardo M. Ponti
PDF
Code
Bootstrapping Action-Grounded Visual Dynamics in Unified Vision-Language Models
Can unified vision-language models (VLMs) perform forward dynamics prediction (FDP), i.e., predicting the future state (in image form) …
Yifu Qiu
,
Yftah Ziser
,
Anna Korhonen
,
Shay B. Cohen
,
Edoardo M. Ponti
PDF
Code
The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs
Sparse attention offers a promising strategy to extend long-context capabilities in Transformer LLMs, yet its efficiency-accuracy …
Piotr Nawrot
,
Robert Li
,
Renjie Huang
,
Sebastian Ruder
,
Kelly Marchisio
,
Edoardo M. Ponti
PDF
Code
Zero-Shot Tokenizer Transfer
Language models (LMs) are bound to their tokenizer, which maps raw text to a sequence of vocabulary items (tokens). This restricts …
Benjamin Minixhofer
,
Edoardo M. Ponti
,
Ivan Vulić
PDF
Code
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
Transformers have emerged as the backbone of large language models (LLMs). However, generation remains inefficient due to the need to …
Piotr Nawrot
,
Adrian Łańcucki
,
Marcin Chochowski
,
David Tarjan
,
Edoardo M. Ponti
PDF
Code
Efficient Transformers with Dynamic Token Pooling
Transformers achieve unrivalled performance in modelling language, but remain inefficient in terms of memory and time complexity. A …
Piotr Nawrot
,
Jan Chorowski
,
Adrian Łańcucki
,
Edoardo M. Ponti
PDF
Code
Cite
×