Adaptive Tokenization and Memory in Foundation Models for Efficient and Long-Horizon AI (AToM ⚛︎)

The recent revolution in generative AI is powered by the ever-growing computing scale of Foundation Models (FMs). This, however, causes a series of harmful ramifications, such as their unsustainable energy demand and carbon emissions. AToM sets out to reverse this trend by remedying a fundamental source of inefficiency in current FMs: their compute is determined by the granularity of the sequences of representations used to compute or memorise information. However, this granularity is fixed fully determined by the memory update mechanism and the input data segmentation.

Instead, AToM FMs will adapt the granularity of sequences of representations, revisiting the “atomic” units for memory and computation, by learning to compress them in an end-to-end fashion. To facilitate the swift adoption of this new technology, AToM will repurpose state-of-the-art FMs into adaptive architectures and release them to the public.

This will not only elevate FMs to unprecedented levels of efficiency, but also foster the emergence of new FM capabilities. In fact, compressing the FM representations will effectively broaden their horizon, i.e., the amount of input and output they can perceive and generate, respectively. This will unlock 1) permanent memories, which are a prerequisite for lifelong learning, 2) hyper-scaling for reasoning-intense problems (such as maths, science, and coding), and long-horizon world modelling for multimodal planning and grounding.

Models

Hugging Face checkpoints

allenai/Bolmo-7B

State-of-the-art, fully open-source large language model with latent tokenization. Available in 1B and 7B sizes.

nvidia/Qwen3-8B-DMS-8x

8x KV cache compression without quality degradation. Ideal for inference-time scaling.

Publications

Related papers

Language models (LMs) are bound to their tokenizer, which maps raw text to a sequence of vocabulary items (tokens). This restricts …

Transformers have emerged as the backbone of large language models (LLMs). However, generation remains inefficient due to the need to …