Meta’s little LLaMA model comes with big benefits for AI researchers
Large language models have taken the tech world by storm. They power AI tools — like ChatGPT and other conversational models, for instance — that can solve problems, answer questions, make predictions, and more. However, using these tools come with significant risk. They’ve been known to make plausible-sounding but untrue declarations, generate toxic content, and mimic the bias rooted in AI training data.
To help researchers address those problems, Meta, on Friday, announced the release of a new large language model called LLaMA (Large Language Model Meta AI). The company is making it available under a noncommercial license focused on research use cases, with plans to grant access on a case-by-case basis. It will be available to academic researchers; those affiliated with organizations in government, civil society, and academia; and industry research laboratories around the world.
Also: These experts are racing to protect AI from hackers. Time is running out
What’s interesting about LLaMA is that it’s relatively little.
As the name suggests, large language models are pretty big. It takes huge sums of data on language (whether that’s spoken language, computer code, genetic data, or other “languages”) to create an AI model sophisticated enough to solve problems in that language, find answers, or generate its own compositions.
“Training smaller foundation models like LLaMA is desirable in the large language model space because it requires far less computing power and resources to test new approaches, validate others’ work, and explore new use cases,” Meta noted.
To train a relatively “small” LLM, Meta used “tokens” — pieces of words, rather than whole words. Meta trained LLaMA on text from 20 languages with the most speakers, focusing on those with Latin and Cyrillic alphabets.
LLaMA is actually a collection of models, ranging from 7 billion to 65 billion parameters. LLaMA 65B and LLaMA 33B were trained with 1.4 trillion tokens, while the smallest model, LLaMA 7B, was trained on one trillion tokens. The models were trained using only publicly available datasets.
Also: How to use ChatGPT: Everything you need to know
While they are small, the LLaMA models are powerful. Meta said LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, while LLaMA-65B is competitive with the best models, Chinchilla70B and PaLM-540B.
LLaMA is also valuable to the research community as a set of foundation models. Foundation models are trained on unlabeled data, meaning they can be tailored for a wide range of use cases.
Meta will make LLaMA available at several sizes (7B, 13B, 33B, and 65B parameters) and is also sharing a LLAMA model card that details how it built the model. The company is also providing a set of evaluations on benchmarks evaluating model biases and toxicity, so that researchers can understand LLaMA’s limitations and advance research in these areas.
For all the latest Technology News Click Here
For the latest news and updates, follow us on Google News.