As the forces of open-source generative AI try to counter the closed-source AI giants like OpenAI and Anthropic, one of their key weapons will be the efficiency gains from running smaller models that take less time to train, less energy, fewer computing resources, and, as a result, less money.
In that vein, last week brought two new open-source large language models that compete with the best of closed-source code from OpenAI and Anthropic. AI startup AI21 Labs and database technology vendor Databricks separately demonstrated how smaller neural networks can match much bigger models, at least on benchmark tests.
Also: Move over Gemini, open-source AI has video tricks of its own
AI21's Jamba is a remarkable combination of two different approaches to language models: a Transformer, the key technology on which most language models are based, including OpenAI's GPT-4, and a second neural network called a "state space model," or SSM.
Scholars at Carnegie Mellon University and Princeton University improved the SSM to make a more efficient solution called "Mamba." AI21 scholars Opher Lieber, Barak Lenz and team then combined Mamba with the Transformer to produce "Joint Attention and Mamba," or Jamba. As described in AI21's blog post, "Jamba outperforms or matches other state-of-the-art models in its size class on a wide range of benchmarks."
In a number of tables, Lieber, Lenz and team show how Jamba performs on reasoning and other tasks. "Noticeably, Jamba performs comparably to the leading publicly available models of similar or larger size, including Llama-2 70B and Mixtral."
Jamba slims down the memory usage of a large language model. At 12 billion "parameters," or, neural weights, it is in one sense comparable to Meta's open-source Llama 2 7-billion parameter model. However, while Llama 2 7B uses 128GB of DRAM to store the "keys and values" that make the Transformer's attention function work, Jamba requires only 4GB.
As the team put it, "Trading off attention layers for Mamba layers reduces the total size of the KV cache" (the key-value memory database). The result of slimming down the memory is that "we end up with a powerful model that fits in a single 80GB GPU" (one of Nvidia's older A100 GPUs).
Also:Cybercriminals are using Meta's Llama 2 AI, according to CrowdStrike
Despite the slimmer size, Jamba hits a new high mark: the ability to take in the most amount of characters or words of any open-source model. "Our model supports a context length of 256K tokens -the longest supported context length for production-grade publicly available models."
Jamba's code is available on Hugging Face under the Apache open-source license.
The second striking innovation this week is Databricks' DBRX. Databricks' internal AI team, MosaicML, which the company acquired in 2023, built DBRX from what's called a "mixture of experts," a large language model approach that shuts off some of the neural weights to conserve computing and memory needs. "MoE," as it's often known, is among the tools that Google used for its recent Gemini large language model.
As Databricks explains in its blog post, "MoEs essentially let you train bigger models and serve them at faster throughput." Because DBRX can shut down some parameters, it uses only 36 billion out of its 132 billion neural weights to make predictions.
MoE lets DBRX do more with less. Among its remarkable achievements, "DBRX beats GPT-3.5 on most benchmarks," the MosaicML team wrote, including tests of language understanding and computer coding ability, even though GPT-3.5 has 175 billion parameters (five times as many).
Also:Why open-source generative AI models are still a step behind GPT-4
What's more, when used through the prompt as a chatbot, "DBRX generation speed is significantly faster than LLaMA2-70B," even though Llama 2 has twice the number of parameters.
Databricks aims to drive the adoption of open-source models in enterprises. "Open-source LLMs will continue gaining momentum," the team declared. "In particular, we think they provide an exciting opportunity for organizations to customize open-source LLMs that can become their IP, which they use to be competitive in their industry. Towards that, we designed DBRX to be easily customizable so that enterprises can improve the quality of their AI applications. Starting today on the Databricks Platform, enterprises can interact with DBRX, leverage its long context abilities in RAG systems, and build custom DBRX models on their own private data."
DBRX's code is offered on GitHub and Hugging Face through Databricks' open-source license.
As significant as both achievements are, the one over-arching shortcoming of these models is that they are not "multimodal," -- they deal only with text, not with images and video the way that GPT-4, Gemini, and other models can.