California-based Nucleus AI, a four-member startup with talent from Amazon and Samsung Research, today emerged from stealth with the launch of its first product: a 22-billion-parameter large language model (LLM).
Available under an open-source MIT license and commercial license, the general-purpose model sits between 13B and 34B segments and can be fine-tuned for different generation tasks and products. Nucleus says it outperforms models of comparable size and will eventually help the company build towards its goal of using AI for transforming agriculture.
“We’re starting with our 22-billion model, which is a transformer model. Then, in about two weeks’ time, we’ll be releasing our state-of-the-art RetNet models, which would give significant benefits in terms of costs and inference speeds,” Gnandeep Moturi, the CEO of the company, told VentureBeat.
The new Nucleus AI model
Nucleus started training the 22B model about three and a half months ago after receiving compute resources from an early investor.
The company tapped existing research and the open-source community to pre-train the LLM on a context length of 2,048 tokens and eventually trained it on a trillion tokens of data, covering large-scale deduplicated and cleaned information scraped from the web, Wikipedia, Stack Exchange, arXiv and code.
This established a well-rounded knowledge base for the model, covering general information to academic research and coding insights.
As the next step, Nucleus plans to release additional versions of the 22B model, trained on 350 billion tokens and 700 billion tokens, as well as two RetNet models – 3 billion parameters and 11 billion parameters – that have been pre-trained on the larger context length of 4,096 tokens.
These smaller-sized models will bring the best of RNN and transformer neural network architectures and deliver huge gains in terms of speed and costs. In internal experiments, Moturi said, they were found to be 15 times faster and required only a quarter of the GPU memory that comparable transformer models generally demand.
“So far, there’s only been research to prove that this could work. No one has actually built a model and released it to the public,” the CEO added.
Bigger ambitions
While the models will be available for enterprise applications, Nucleus has bigger ambitions with its AI research.
Instead of building straight-up chatbots like other LLM companies OpenAI, Anthropic, and Cohere, Moturi said they plan to leverage AI to build an intelligent operating system for agriculture, aimed at optimizing supply and demand and mitigating uncertainties for farmers.
“We have a marketplace-type of idea where demand and supply will be hyper-optimized for farmers in such a way that Uber does for taxi drivers,” he said.
This could solve multiple challenges for farmers, right from issues from climate change and lack of knowledge to optimizing supply and maintaining distribution.
“Right now, we’re not competing against anybody else’s algorithms. When we got access to compute, we were trying to build internal products to step into the farming landscape. But then we figured we need language models as the core of the marketplace itself and started building that with the contribution from the open-source community,” he added.
More details about the farming-centric OS and the RetNet models will be announced later this month.
Source : Venture Beat