Ivan Zhou

View Original

Release of Levanter 1.0

Today at Stanford, we released Levanter, a Jax-based framework for training foundation models. You can read more about Levanter in this official release blog here.

There are several things that we really like about Levanter:

  1. Haliax, the named tensor module in Levanter, makes DL code easier to read, compose, and debug than positional axes. I found that I no longer need detailed comments to interpret the reshape and broadcast operations in matrix computation.

  2. Levanter offers FSDP and Tensor Parallelism to train LMs at scale. We achieved up to 54% Model Flop Utilization and 77.1% Hardware Flop Utilization on TPUs, which matches the state-of-the-art performance of Google’s MaxText, MosaicML, and Megatron.

  3. Training jobs on Levanter achieve perfect bit-wise reproducibility. You can achieve the exact loss curve with the same configurations. Say goodbye to non-deterministic debugging with DL.

  4. Levanter has very neat features like live visualization for text data, distributed data preprocessing with Ray, and tight integration with W&B.

It is my great pleasure to work closely with David Hall, Percy Liang, and other amazing colleagues on Levanter at Stanford CRFM. I have used Levanter to train multiple large-scale language models on TPU. It is delightful to use and powerful for its purpose.

Levanter is now open-source on Github here under Apache 2.0. I hope it will be useful for the community to train LLMs and that it will continue to evolve in future versions!