Jurassic-2
|
March 2023
|
AI21 Labs
|
Exact size unknown
|
Unknown
|
|
Proprietary
|
Multilingual
|
GPT-4 |
March 2023 |
OpenAI |
Exact number unknown |
Unknown
|
Unknown |
proprietary
|
Available for ChatGPT Plus users and used in several products.
|
GLaM (Generalist Language Model) |
December 2021 |
Google |
1.2 trillion |
1.6 trillion tokens
|
5600 |
Proprietary
|
Sparse mixture of experts model, making it more expensive to train but cheaper to run inference compared to GPT-3.
|
PanGu-Σ |
March 2023 |
Huawei |
1.085 trillion |
329 billion tokens
|
|
Proprietary
|
|
PaLM (Pathways Language Model) |
April 2022 |
Google |
540 billion |
768 billion tokens
|
29250 |
Proprietary
|
aimed to reach the practical limits of model scale
|
Minerva |
June 2022 |
Google |
540 billion |
38.5B tokens from webpages filtered for mathematical content and from papers submitted to the arXiv preprint server
|
|
Proprietary
|
LLM trained for solving "mathematical and scientific questions using step-by-step reasoning". Minerva is based on PaLM model, further trained on mathematical and scientific data.
|
Megatron-Turing NLG |
October 2021 |
Microsoft and Nvidia |
530 billion |
338.6 billion tokens
|
|
Restricted web access
|
Standard architecture but trained on a supercomputing cluster.
|
PaLM 2 (Pathways Language Model 2) |
May 2023 |
Google |
340 billion |
3.6 trillion tokens
|
85000 |
Proprietary
|
Used in Bard chatbot.
|
Gopher |
December 2021 |
DeepMind |
280 billion |
300 billion tokens
|
5833 |
Proprietary
|
|
Ernie 3.0 Titan |
December 2021 |
Baidu |
260 billion |
4 Tb
|
|
Proprietary
|
Chinese-language LLM. Ernie Bot is based on this model.
|
Falcon 180B |
September 2023 |
Technology Innovation Institute |
180 billion |
3.5 trillion tokens
|
|
Falcon 180B TII license
| |
GPT-3 |
2020 |
OpenAI |
175 billion |
300 billion tokens
|
3640 |
proprietary
|
A fine-tuned variant of GPT-3, termed GPT-3.5, was made available to the public through a web interface called ChatGPT in 2022.
|
OPT (Open Pretrained Transformer) |
May 2022 |
Meta |
175 billion |
180 billion tokens
|
310 |
Non-commercial research
|
GPT-3 architecture with some adaptations from Megatron
|
BLOOM |
July 2022 |
Large collaboration led by Hugging Face |
175 billion |
350 billion tokens (1.6TB)
|
|
Responsible AI
|
Essentially GPT-3 but trained on a multi-lingual corpus (30% English excluding programming languages)
|
LaMDA (Language Models for Dialog Applications) |
January 2022 |
Google |
137 billion |
1.56T words, 168 billion tokens
|
4110 |
Proprietary
|
Specialized for response generation in conversations.
|
Galactica |
November 2022 |
Meta |
120 billion |
106 billion tokens
|
unknown |
CC-BY-NC-4.0
|
Trained on scientific text and modalities.
|
YaLM 100B |
June 2022 |
Yandex |
100 billion
|
1.7TB |
|
Apache 2.0 |
English-Russian model based on Microsoft's Megatron-LM.
|
Chinchilla |
March 2022 |
DeepMind |
70 billion |
1.4 trillion tokens
|
6805 |
Proprietary
|
Reduced-parameter model trained on more data. Used in the Sparrow bot.
|
Llama 2 |
July 2023 |
Meta |
70 billion |
2 trillion tokens
|
|
Llama 2 license
|
Successor of LLaMA.
|
LLaMA (Large Language Model Meta AI) |
February 2023 |
Meta |
65 billion |
1.4 trillion
|
6300 |
Non-commercial research
|
Trained on a large 20-language corpus to aim for better performance with fewer parameters. Researchers from Stanford University trained a fine-tuned model based on LLaMA weights, called Alpaca.
|
Claude |
December 2021 |
Anthropic |
52 billion |
400 billion tokens
|
|
beta
|
Fine-tuned for desirable behavior in conversations.
|
BloombergGPT |
March 2023 |
Bloomberg L.P. |
50 billion |
363 billion token dataset based on Bloomberg's data sources, plus 345 billion tokens from general purpose datasets
|
|
Proprietary
|
LLM trained on financial data from proprietary sources, that "outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks"
|
Falcon |
March 2023 |
Technology Innovation Institute |
40 billion |
1 trillion tokens, from RefinedWeb (filtered web text corpus) plus some "curated corpora".
|
2800 |
Apache 2.0
|
Training cost around 2700 petaFLOP-days, 75% that of GPT-3.
|
GPT-NeoX |
February 2022 |
EleutherAI |
20 billion |
825 GiB
|
740 |
Apache 2.0
|
based on the Megatron architecture
|
AlexaTM (Teacher Models) |
November 2022 |
Amazon |
20 billion |
1.3 trillion
|
|
proprietary
|
bidirectional sequence-to-sequence architecture
|
OpenAssistant |
March 2023 |
LAION |
17 billion |
1.5 trillion tokens
|
|
Apache 2.0
|
Trained on crowdsourced open data
|
Cerebras-GPT
|
March 2023
|
Cerebras
|
13 billion
|
|
270 |
Apache 2.0
|
Trained with Chinchilla formula.
|
Mistral 7B |
September 2023 |
Mistral |
7.3 billion |
Unknown
|
|
Apache 2.0
|
|
GPT-J |
June 2021 |
EleutherAI |
6 billion |
825 GiB
|
200 |
Apache 2.0
|
GPT-3-style language model
|
GPT-Neo |
March 2021 |
EleutherAI |
2.7 billion |
825 GiB
|
|
MIT
|
The first of a series of free GPT-3 alternatives released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3.
|
GPT-2 |
2019 |
OpenAI |
1.5 billion |
40GB (~10 billion tokens)
|
|
MIT
|
general-purpose model based on transformer architecture
|
BERT |
2018 |
Google |
340 million |
3.3 billion words
|
9 |
Apache 2.0
|
An early and influential language model, but encoder-only and thus not built to be prompted or generative
|
XLNet |
2019 |
Google |
~340 million |
33 billion words
|
|
|
An alternative to BERT; designed as encoder-only
|