The recent revolution in generative AI is powered by the ever-growing computing scale of Foundation Models (FMs). This, however, causes a series of harmful ramifications, such as their unsustainable energy demand and carbon emissions. AToM sets out to reverse this trend by remedying a fundamental source of inefficiency in current FMs: their compute is determined by the granularity of the sequences of representations used to compute or memorise information. However, this granularity is fixed fully determined by the memory update mechanism and the input data segmentation.
Instead, AToM FMs will adapt the granularity of sequences of representations, revisiting the “atomic” units for memory and computation, by learning to compress them in an end-to-end fashion. To facilitate the swift adoption of this new technology, AToM will repurpose state-of-the-art FMs into adaptive architectures and release them to the public.
This will not only elevate FMs to unprecedented levels of efficiency, but also foster the emergence of new FM capabilities. In fact, compressing the FM representations will effectively broaden their horizon, i.e., the amount of input and output they can perceive and generate, respectively. This will unlock 1) permanent memories, which are a prerequisite for lifelong learning, 2) hyper-scaling for reasoning-intense problems (such as maths, science, and coding), and long-horizon world modelling for multimodal planning and grounding.
Hugging Face checkpoints
State-of-the-art, fully open-source large language model with latent tokenization. Available in 1B and 7B sizes.
8x KV cache compression without quality degradation. Ideal for inference-time scaling.
Related papers