[ad_1]
On Wednesday, Stability AI released a new family of open source AI language models called StableLM. Stability hopes to repeat the catalyzing effects of its Stable Diffusion open source image synthesis model, launched in 2022. With refinement, StableLM could be used to build an open source alternative to ChatGPT.
StableLM is currently available in alpha form on GitHub in 3 billion and 7 billion parameter model sizes, with 15 billion and 65 billion parameter models to follow, according to Stability. The company is releasing the models under the Creative Commons BY-SA-4.0 license, which requires that adaptations must credit the original creator and share the same license.
Stability AI Ltd. is a London-based firm that has positioned itself as an open source rival to OpenAI, which, despite its “open” name, rarely releases open source models and keeps its neural network weights—the mass of numbers that defines the core functionality of an AI model—proprietary.
“Language models will form the backbone of our digital economy, and we want everyone to have a voice in their design,” writes Stability in an introductory blog post. “Models like StableLM demonstrate our commitment to AI technology that is transparent, accessible, and supportive.”
Like GPT-4—the large language model (LLM) that powers the most powerful version of ChatGPT—StableLM generates text by predicting the next token (word fragment) in a sequence. That sequence starts with information provided by a human in the form of a “prompt.” As a result, StableLM can compose human-like text and write programs.
Like other recent “small” LLMs like Meta’s LLaMA, Stanford Alpaca, Cerebras-GPT, and Dolly 2.0, StableLM purports to achieve similar performance to OpenAI’s benchmark GPT-3 model while using far fewer parameters—7 billion for StableLM verses 175 billion for GPT-3.
Parameters are variables that a language model uses to learn from training data. Having fewer parameters makes a language model smaller and more efficient, which can make it easier to run on local devices like smartphones and laptops. However, achieving high performance with fewer parameters requires careful engineering, which is a significant challenge in the field of AI.
“Our StableLM models can generate text and code and will power a range of downstream applications,” says Stability. “They demonstrate how small and efficient models can deliver high performance with appropriate training.”
According to Stability AI, StableLM has been trained on “a new experimental data set” based on an open source data set called The Pile, but three times larger. Stability claims that the “richness” of this data set, the details of which it promises to release later, accounts for the “surprisingly high performance” of the model at smaller parameter sizes at conversational and coding tasks.
In our informal experiments with a fine-tuned version of StableLM’s 7B model built for dialog based on the Alpaca method, we found that it seemed to perform better (in terms of outputs you would expect given the prompt) than Meta’s raw 7B parameter LLaMA model, but not at the level of GPT-3. Larger-parameter versions of StableLM may prove more flexible and capable.
In August of last year, Stability funded and publicized the open source launch of Stable Diffusion, developed by researchers at the CompVis group at Ludwig Maximilian University of Munich.
As an early open source latent diffusion model that could generate images from prompts, Stable Diffusion kickstarted an era of rapid development in image-synthesis technology. It also created a strong backlash among artists and corporate entities, some of which have sued Stability AI. Stability’s move into language models could inspire similar results.
Users can test the 7 billion-parameter StableLM base model Hugging Face and the fine-tuned model on Replicate. In addition, Hugging Face hosts a dialog-tuned version of StableLM with a similar conversation format as ChatGPT.
Stability says it will release a full technical report on StableLM “in the near future.”
[ad_2]
Source link