Table of Contents

Google Searches

LLM Searches

Fine-Tuning

Embeddings

LLM Models

Name Release date Developer Number of parameters Corpus size Training cost (petaFLOP-day) License Notes
Jurassic-2 March 2023 AI21 Labs Exact size unknown Unknown Proprietary Multilingual
GPT-4 March 2023 OpenAI Exact number unknown Unknown Unknown proprietary Available for ChatGPT Plus users and used in several products.
GLaM (Generalist Language Model) December 2021 Google 1.2 trillion 1.6 trillion tokens 5600 Proprietary Sparse mixture of experts model, making it more expensive to train but cheaper to run inference compared to GPT-3.
PanGu-Σ March 2023 Huawei 1.085 trillion 329 billion tokens Proprietary
PaLM (Pathways Language Model) April 2022 Google 540 billion 768 billion tokens 29250 Proprietary aimed to reach the practical limits of model scale
Minerva June 2022 Google 540 billion 38.5B tokens from webpages filtered for mathematical content and from papers submitted to the arXiv preprint server Proprietary LLM trained for solving "mathematical and scientific questions using step-by-step reasoning". Minerva is based on PaLM model, further trained on mathematical and scientific data.
Megatron-Turing NLG October 2021 Microsoft and Nvidia 530 billion 338.6 billion tokens Restricted web access Standard architecture but trained on a supercomputing cluster.
PaLM 2 (Pathways Language Model 2) May 2023 Google 340 billion 3.6 trillion tokens 85000 Proprietary Used in Bard chatbot.
Gopher December 2021 DeepMind 280 billion 300 billion tokens 5833 Proprietary
Ernie 3.0 Titan December 2021 Baidu 260 billion 4 Tb Proprietary Chinese-language LLM. Ernie Bot is based on this model.
Falcon 180B September 2023 Technology Innovation Institute 180 billion 3.5 trillion tokens Falcon 180B TII license
GPT-3 2020 OpenAI 175 billion 300 billion tokens 3640 proprietary A fine-tuned variant of GPT-3, termed GPT-3.5, was made available to the public through a web interface called ChatGPT in 2022.
OPT (Open Pretrained Transformer) May 2022 Meta 175 billion 180 billion tokens 310 Non-commercial research GPT-3 architecture with some adaptations from Megatron
BLOOM July 2022 Large collaboration led by Hugging Face 175 billion 350 billion tokens (1.6TB) Responsible AI Essentially GPT-3 but trained on a multi-lingual corpus (30% English excluding programming languages)
LaMDA (Language Models for Dialog Applications) January 2022 Google 137 billion 1.56T words, 168 billion tokens 4110 Proprietary Specialized for response generation in conversations.
Galactica November 2022 Meta 120 billion 106 billion tokens unknown CC-BY-NC-4.0 Trained on scientific text and modalities.
YaLM 100B June 2022 Yandex 100 billion 1.7TB Apache 2.0 English-Russian model based on Microsoft's Megatron-LM.
Chinchilla March 2022 DeepMind 70 billion 1.4 trillion tokens 6805 Proprietary Reduced-parameter model trained on more data. Used in the Sparrow bot.
Llama 2 July 2023 Meta 70 billion 2 trillion tokens Llama 2 license Successor of LLaMA.
LLaMA (Large Language Model Meta AI) February 2023 Meta 65 billion 1.4 trillion 6300 Non-commercial research Trained on a large 20-language corpus to aim for better performance with fewer parameters. Researchers from Stanford University trained a fine-tuned model based on LLaMA weights, called Alpaca.
Claude December 2021 Anthropic 52 billion 400 billion tokens beta Fine-tuned for desirable behavior in conversations.
BloombergGPT March 2023 Bloomberg L.P. 50 billion 363 billion token dataset based on Bloomberg's data sources, plus 345 billion tokens from general purpose datasets Proprietary LLM trained on financial data from proprietary sources, that "outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks"
Falcon March 2023 Technology Innovation Institute 40 billion 1 trillion tokens, from RefinedWeb (filtered web text corpus) plus some "curated corpora". 2800 Apache 2.0 Training cost around 2700 petaFLOP-days, 75% that of GPT-3.
GPT-NeoX February 2022 EleutherAI 20 billion 825 GiB 740 Apache 2.0 based on the Megatron architecture
AlexaTM (Teacher Models) November 2022 Amazon 20 billion 1.3 trillion proprietary bidirectional sequence-to-sequence architecture
OpenAssistant March 2023 LAION 17 billion 1.5 trillion tokens Apache 2.0 Trained on crowdsourced open data
Cerebras-GPT March 2023 Cerebras 13 billion 270 Apache 2.0 Trained with Chinchilla formula.
Mistral 7B September 2023 Mistral 7.3 billion Unknown Apache 2.0
GPT-J June 2021 EleutherAI 6 billion 825 GiB 200 Apache 2.0 GPT-3-style language model
GPT-Neo March 2021 EleutherAI 2.7 billion 825 GiB MIT The first of a series of free GPT-3 alternatives released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3.
GPT-2 2019 OpenAI 1.5 billion 40GB (~10 billion tokens) MIT general-purpose model based on transformer architecture
BERT 2018 Google 340 million 3.3 billion words 9 Apache 2.0 An early and influential language model, but encoder-only and thus not built to be prompted or generative
XLNet 2019 Google ~340 million 33 billion words An alternative to BERT; designed as encoder-only
Large language model. (2023, October 20). In Wikipedia. https://en.wikipedia.org/wiki/Large_language_model

Slides