Edoardo M. Ponti
Publications
FAQ
Resume
AToM ⚛︎
TEAS ☕︎
huggingface
nvidia/Qwen3-8B-DMS-8x
8x KV cache compression without quality degradation. Ideal for inference-time scaling.
Cite
×