CLE | Promise Legal Engage

Google Searches

LLM Searches

Fine-Tuning

Embeddings

LLM Models

Large language model. (2023, October 20). In Wikipedia. https://en.wikipedia.org/wiki/Large_language_model
Name	Release date	Developer	Number of parameters	Corpus size	Training cost (petaFLOP-day)	License	Notes
Jurassic-2	March 2023	AI21 Labs	Exact size unknown	Unknown		Proprietary	Multilingual
GPT-4	March 2023	OpenAI	Exact number unknown	Unknown	Unknown	proprietary	Available for ChatGPT Plus users and used in several products.
GLaM (Generalist Language Model)	December 2021	Google	1.2 trillion	1.6 trillion tokens	5600	Proprietary	Sparse mixture of experts model, making it more expensive to train but cheaper to run inference compared to GPT-3.
PanGu-Σ	March 2023	Huawei	1.085 trillion	329 billion tokens		Proprietary
PaLM (Pathways Language Model)	April 2022	Google	540 billion	768 billion tokens	29250	Proprietary	aimed to reach the practical limits of model scale
Minerva	June 2022	Google	540 billion	38.5B tokens from webpages filtered for mathematical content and from papers submitted to the arXiv preprint server		Proprietary	LLM trained for solving "mathematical and scientific questions using step-by-step reasoning". Minerva is based on PaLM model, further trained on mathematical and scientific data.
Megatron-Turing NLG	October 2021	Microsoft and Nvidia	530 billion	338.6 billion tokens		Restricted web access	Standard architecture but trained on a supercomputing cluster.
PaLM 2 (Pathways Language Model 2)	May 2023	Google	340 billion	3.6 trillion tokens	85000	Proprietary	Used in Bard chatbot.
Gopher	December 2021	DeepMind	280 billion	300 billion tokens	5833	Proprietary
Ernie 3.0 Titan	December 2021	Baidu	260 billion	4 Tb		Proprietary	Chinese-language LLM. Ernie Bot is based on this model.
Falcon 180B	September 2023	Technology Innovation Institute	180 billion	3.5 trillion tokens		Falcon 180B TII license
GPT-3	2020	OpenAI	175 billion	300 billion tokens	3640	proprietary	A fine-tuned variant of GPT-3, termed GPT-3.5, was made available to the public through a web interface called ChatGPT in 2022.
OPT (Open Pretrained Transformer)	May 2022	Meta	175 billion	180 billion tokens	310	Non-commercial research	GPT-3 architecture with some adaptations from Megatron
BLOOM	July 2022	Large collaboration led by Hugging Face	175 billion	350 billion tokens (1.6TB)		Responsible AI	Essentially GPT-3 but trained on a multi-lingual corpus (30% English excluding programming languages)
LaMDA (Language Models for Dialog Applications)	January 2022	Google	137 billion	1.56T words, 168 billion tokens	4110	Proprietary	Specialized for response generation in conversations.
Galactica	November 2022	Meta	120 billion	106 billion tokens	unknown	CC-BY-NC-4.0	Trained on scientific text and modalities.
YaLM 100B	June 2022	Yandex	100 billion	1.7TB		Apache 2.0	English-Russian model based on Microsoft's Megatron-LM.
Chinchilla	March 2022	DeepMind	70 billion	1.4 trillion tokens	6805	Proprietary	Reduced-parameter model trained on more data. Used in the Sparrow bot.
Llama 2	July 2023	Meta	70 billion	2 trillion tokens		Llama 2 license	Successor of LLaMA.
LLaMA (Large Language Model Meta AI)	February 2023	Meta	65 billion	1.4 trillion	6300	Non-commercial research	Trained on a large 20-language corpus to aim for better performance with fewer parameters. Researchers from Stanford University trained a fine-tuned model based on LLaMA weights, called Alpaca.
Claude	December 2021	Anthropic	52 billion	400 billion tokens		beta	Fine-tuned for desirable behavior in conversations.
BloombergGPT	March 2023	Bloomberg L.P.	50 billion	363 billion token dataset based on Bloomberg's data sources, plus 345 billion tokens from general purpose datasets		Proprietary	LLM trained on financial data from proprietary sources, that "outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks"
Falcon	March 2023	Technology Innovation Institute	40 billion	1 trillion tokens, from RefinedWeb (filtered web text corpus) plus some "curated corpora".	2800	Apache 2.0	Training cost around 2700 petaFLOP-days, 75% that of GPT-3.
GPT-NeoX	February 2022	EleutherAI	20 billion	825 GiB	740	Apache 2.0	based on the Megatron architecture
AlexaTM (Teacher Models)	November 2022	Amazon	20 billion	1.3 trillion		proprietary	bidirectional sequence-to-sequence architecture
OpenAssistant	March 2023	LAION	17 billion	1.5 trillion tokens		Apache 2.0	Trained on crowdsourced open data
Cerebras-GPT	March 2023	Cerebras	13 billion		270	Apache 2.0	Trained with Chinchilla formula.
Mistral 7B	September 2023	Mistral	7.3 billion	Unknown		Apache 2.0
GPT-J	June 2021	EleutherAI	6 billion	825 GiB	200	Apache 2.0	GPT-3-style language model
GPT-Neo	March 2021	EleutherAI	2.7 billion	825 GiB		MIT	The first of a series of free GPT-3 alternatives released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3.
GPT-2	2019	OpenAI	1.5 billion	40GB (~10 billion tokens)		MIT	general-purpose model based on transformer architecture
BERT	2018	Google	340 million	3.3 billion words	9	Apache 2.0	An early and influential language model, but encoder-only and thus not built to be prompted or generative
XLNet	2019	Google	~340 million	33 billion words			An alternative to BERT; designed as encoder-only

Useful Links

Slides

Smart Tech, Strong Practice: AI Essentials for Lawyers by Alex Shahrestani

Table of Contents

Google Searches

LLM Searches

Fine-Tuning

Embeddings

LLM Models

Useful Links

Slides