Skip to content

Commit 10354e5

Browse files
author
Xiaotian Han
authored
Merge pull request #1 from ahxt/main
first commit
2 parents 7ef4f43 + c7c79df commit 10354e5

File tree

3 files changed

+122
-1
lines changed

3 files changed

+122
-1
lines changed

README.md

Lines changed: 122 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,122 @@
1-
# LLMsPracticalGuide
1+
<h1 align="center">The Practical Guides for Large Language Models</h1>
2+
3+
<p align="center">
4+
<img src="https://camo.githubusercontent.com/64f8905651212a80869afbecbf0a9c52a5d1e70beab750dea40a994fa9a9f3c6/68747470733a2f2f617765736f6d652e72652f62616467652e737667" alt="Awesome" data-canonical-src="https://awesome.re/badge.svg" style="max-width: 100%;">
5+
</p>
6+
7+
8+
A curated (still actively updated) list of practical guide resources of LLMs. These sources aim to help practitioners navigate the vast landscape of large language models (LLMs) and their applications in natural language processing (NLP) applications.
9+
10+
## Practical Guide for Models
11+
12+
We build an evolutionary tree of modern Large Language Models (LLMs) to trace the development of language models in recent years and highlights some of the most well-known models, in the following figure:
13+
14+
<p align="center">
15+
<img width="600" src="./imgs/tree.png"/>
16+
</p>
17+
18+
19+
### BERT-style Language Models: Encoder-Decoder or Encoder-only
20+
21+
- BERT **BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding**, 2018, [Paper](https://aclanthology.org/N19-1423.pdf)
22+
- RoBERTa **ALBERT: A Lite BERT for Self-supervised Learning of Language Representations**, 2019, [Paper](https://arxiv.org/abs/1909.11942)
23+
- DistilBERT **DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter**, 2019, [Paper](https://arxiv.org/abs/1910.01108)
24+
- ALBERT **ALBERT: A Lite BERT for Self-supervised Learning of Language Representations**, 2019, [Paper](https://arxiv.org/abs/1909.11942)
25+
- ELECTRA **ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORS**, 2020, [Paper](https://openreview.net/pdf?id=r1xMH1BtvB)
26+
- T5 **"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"**. *Colin Raffel et al.* JMLR 2019. [[Paper](https://arxiv.org/abs/1910.10683)]
27+
- AlexaTM **"AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model"**. *Saleh Soltan et al.* arXiv 2022. [[Paper](https://arxiv.org/abs/2208.01448)]
28+
29+
30+
### GPT-style Language Models: Decoder-only
31+
32+
- GPT-3 **"Language Models are Few-Shot Learners"**. NeurIPS 2020. [[Paper](https://arxiv.org/abs/2005.14165)]
33+
- OPT **"OPT: Open Pre-trained Transformer Language Models"**. 2022. [[Paper](https://arxiv.org/abs/2205.01068)]
34+
- PaLM **"PaLM: Scaling Language Modeling with Pathways"**. *Aakanksha Chowdhery et al.* arXiv 2022. [[Paper](https://arxiv.org/abs/2204.02311)]
35+
- BLOOM **"BLOOM: A 176B-Parameter Open-Access Multilingual Language Model"**. 2022. [[Paper](https://arxiv.org/abs/2211.05100)]
36+
- GLM **"GLM-130B: An Open Bilingual Pre-trained Model"**. 2022. [[Paper](https://arxiv.org/abs/2210.02414)]
37+
- MT-NLG **"Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model"**. 2021. [[Paper](https://arxiv.org/abs/2201.11990)]
38+
- GLaM **"GLaM: Efficient Scaling of Language Models with Mixture-of-Experts"**. ICML 2022. [[Paper](https://arxiv.org/abs/2112.06905)]
39+
- Gopher **"Scaling Language Models: Methods, Analysis & Insights from Training Gopher"**. 2021. [[Paper](http://arxiv.org/abs/2112.11446v2)]
40+
- chinchilla **"Training Compute-Optimal Large Language Models"**. 2022. [[Paper](https://arxiv.org/abs/2203.15556)]
41+
- LaMDA **"LaMDA: Language Models for Dialog Applications"**. 2021. [[Paper](https://arxiv.org/abs/2201.08239)]
42+
- LLaMA **"LLaMA: Open and Efficient Foundation Language Models"**. 2023. [[Paper](https://arxiv.org/abs/2302.13971v1)]
43+
- GPT-4 **"GPT-4 Technical Report"**. 2023. [[Paper](http://arxiv.org/abs/2303.08774v2)]
44+
- BloombergGPT **BloombergGPT: A Large Language Model for Finance**, 2023, [Paper](https://arxiv.org/abs/2303.17564)
45+
- GPT-NeoX-20B: **"GPT-NeoX-20B: An Open-Source Autoregressive Language Model"**. 2022. [[Paper](https://arxiv.org/abs/2204.06745)]
46+
47+
48+
49+
50+
## Practical Guide for Data
51+
52+
53+
### Pretraining data
54+
### Finetuning data
55+
### Test data/user data
56+
57+
58+
59+
60+
61+
62+
## Practical Guide for NLP Tasks
63+
We build a decision flow for choosing LLMs or fine-tuned models~\protect\footnotemark for user's NLP applications. The decision flow helps users assess whether their downstream NLP applications at hand meet specific conditions and, based on that evaluation, determine whether LLMs or fine-tuned models are the most suitable choice for their applications.
64+
<p align="center">
65+
<img width="500" src="./imgs/decision.png"/>
66+
</p>
67+
68+
### Traditional NLU tasks
69+
1. No use case
70+
2. Use case
71+
72+
### Traditional NLG tasks
73+
1. Use case
74+
2. No use case
75+
76+
77+
### Knowledge-intensive tasks
78+
1. Use case
79+
2. No use case
80+
81+
### Abilities with Scaling
82+
83+
Scaling of LLMs~(e.g. parameters, training computation, etc.) can greatly empower pretrained language models. With the model scaling up, a model generally becomes more capable in a range of tasks. Reflected in some metrics, the performance shows a power-law relationship with the model scale.
84+
85+
1. Use Case with Reasoning
86+
- Arithmetic reasoning
87+
- Problem solving
88+
- Commonsense reasoning
89+
2. Use Cases with Emergent Abilities
90+
- Word manipulation
91+
- Logical deduction
92+
- Logical sequence
93+
- Logic grid puzzles
94+
- Simple math problems
95+
- Coding abilities
96+
- Concept understanding
97+
3. No-Use Cases
98+
- Redefine-math
99+
- Into-the-unknown
100+
- Memo-trap
101+
- NegationQA
102+
103+
### Specific tasks
104+
1. No use case
105+
2. Use case
106+
107+
108+
109+
### Real-World Tasks
110+
111+
112+
113+
### Efficiency
114+
1. Cost
115+
2. Latency
116+
3. Parameter-Efficient Fine-Tuning
117+
118+
119+
### Trustworthiness
120+
1. Robustness and Calibration
121+
2. Fairness and Bias
122+
3. Spurious biases

imgs/decision.png

514 KB
Loading

imgs/tree.png

1.55 MB
Loading

0 commit comments

Comments
 (0)