Smaller, localized versions of AI language models could help address emerging concerns around data privacy and the cost of the technology.
That’s one insight from Darren Oberst, CEO of Ai Bloks and AI framework platform LLMWare, who spoke this week as part of Forward Festival in Madison. He was hosted by the MadAI group, a community of AI professionals in the Madison area.
While interest in large language models such as ChatGPT has only risen as the technology matures, Oberst noted tech industry leaders such as Sam Altman and Elon Musk have discussed the risk of AI developing too quickly. Meanwhile, companies have also been raising concerns around their sensitive data being exposed through the use of AI, he said.
“All is not rosy in generative AI land, there are some real storm clouds on the horizon,” Oberst said. “And we actually believe that small language models have a very important role to play as part of the solution to address many of those concerns.”
Small language models, or SLMs, are created and trained using the same mathematical functions as the high-profile large language models, he explained. The key difference is they have far fewer “parameters,” or variables that influence how the model functions.
While the “mega models” can include hundreds of billions of parameters, the SLMs typically have between 1 billion and 10 billion parameters. Unlike the large models, which often operate on a cloud-based interface due to their large processing requirements, the smaller models can be run on a “medium to high-end laptop … privately, locally, securely, entirely on your machine.”
Oberst said that’s useful for sensitive information such as health data, investigations and government information, as it doesn’t run the risk of exposure. While SLMs still need to be trained on the cloud, once they’re up and running they no longer need to be connected to the broader information ecosystem, he said.
And though they’re smaller, Oberst said SLMs can perform most of the same tasks that applications like ChatGPT are used for, such as getting fact-based answers to questions and basic analysis.
“My experience is that a small model can probably do 80% to 90% of what the ‘mega model’ can do … but you’re going to be able to do it at probably 1/100th the cost,” he said.
That’s particularly useful for highly specific use cases, he said, as a small business or academic team could take a model that’s about 80% accurate and boost it to 95% accurate when targeted for that specialized task.
“The real promise of small models is not just, ‘Oh look, it can kind of do sort of what a big model can do,’” he said. “The idea is that because it is so much smaller, lower cost to adapt, and deployable privately, you can start adapting these models, fine-tuning them … instead of thinking about I’ve got one big model, I’ve got 10 smaller models, each of which does a specific task or purpose for me.”
Watch the video.