Skip to content

Commit fee1190

Browse files
JihadHammoud02Jihadstevhliu
authored
Refactor phi doc (#37583)
* Added documentation for phi model * Update phi.md * Update phi.md * Update phi.md * Update docs/source/en/model_doc/phi.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/phi.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/phi.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/phi.md Co-authored-by: Steven Liu <[email protected]> * Updated model card * Update phi.md * Update phi.md * Update phi.md * Update docs/source/en/model_doc/phi.md Co-authored-by: Steven Liu <[email protected]> --------- Co-authored-by: Jihad <[email protected]> Co-authored-by: Steven Liu <[email protected]>
1 parent b2db54f commit fee1190

File tree

1 file changed

+75
-127
lines changed
  • docs/source/en/model_doc

1 file changed

+75
-127
lines changed

docs/source/en/model_doc/phi.md

Lines changed: 75 additions & 127 deletions
Original file line numberDiff line numberDiff line change
@@ -13,166 +13,117 @@ specific language governing permissions and limitations under the License.
1313
rendered properly in your Markdown viewer.
1414
1515
-->
16-
17-
# Phi
18-
19-
<div class="flex flex-wrap space-x-1">
20-
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
21-
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
22-
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
16+
<div style="float: right;">
17+
<div class="flex flex-wrap space-x-1">
18+
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
19+
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
20+
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
21+
</div>
2322
</div>
2423

25-
## Overview
26-
27-
The Phi-1 model was proposed in [Textbooks Are All You Need](https://arxiv.org/abs/2306.11644) by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li.
28-
29-
The Phi-1.5 model was proposed in [Textbooks Are All You Need II: phi-1.5 technical report](https://arxiv.org/abs/2309.05463) by Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee.
30-
31-
### Summary
32-
33-
In Phi-1 and Phi-1.5 papers, the authors showed how important the quality of the data is in training relative to the model size.
34-
They selected high quality "textbook" data alongside with synthetically generated data for training their small sized Transformer
35-
based model Phi-1 with 1.3B parameters. Despite this small scale, phi-1 attains pass@1 accuracy 50.6% on HumanEval and 55.5% on MBPP.
36-
They follow the same strategy for Phi-1.5 and created another 1.3B parameter model with performance on natural language tasks comparable
37-
to models 5x larger, and surpassing most non-frontier LLMs. Phi-1.5 exhibits many of the traits of much larger LLMs such as the ability
38-
to “think step by step” or perform some rudimentary in-context learning.
39-
With these two experiments the authors successfully showed the huge impact of quality of training data when training machine learning models.
40-
41-
The abstract from the Phi-1 paper is the following:
42-
43-
*We introduce phi-1, a new large language model for code, with significantly smaller size than
44-
competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on
45-
8 A100s, using a selection of “textbook quality” data from the web (6B tokens) and synthetically
46-
generated textbooks and exercises with GPT-3.5 (1B tokens). Despite this small scale, phi-1 attains
47-
pass@1 accuracy 50.6% on HumanEval and 55.5% on MBPP. It also displays surprising emergent
48-
properties compared to phi-1-base, our model before our finetuning stage on a dataset of coding
49-
exercises, and phi-1-small, a smaller model with 350M parameters trained with the same pipeline as
50-
phi-1 that still achieves 45% on HumanEval.*
51-
52-
The abstract from the Phi-1.5 paper is the following:
53-
54-
*We continue the investigation into the power of smaller Transformer-based language models as
55-
initiated by TinyStories – a 10 million parameter model that can produce coherent English – and
56-
the follow-up work on phi-1, a 1.3 billion parameter model with Python coding performance close
57-
to the state-of-the-art. The latter work proposed to use existing Large Language Models (LLMs) to
58-
generate “textbook quality” data as a way to enhance the learning process compared to traditional
59-
web data. We follow the “Textbooks Are All You Need” approach, focusing this time on common
60-
sense reasoning in natural language, and create a new 1.3 billion parameter model named phi-1.5,
61-
with performance on natural language tasks comparable to models 5x larger, and surpassing most
62-
non-frontier LLMs on more complex reasoning tasks such as grade-school mathematics and basic
63-
coding. More generally, phi-1.5 exhibits many of the traits of much larger LLMs, both good –such
64-
as the ability to “think step by step” or perform some rudimentary in-context learning– and bad,
65-
including hallucinations and the potential for toxic and biased generations –encouragingly though, we
66-
are seeing improvement on that front thanks to the absence of web data. We open-source phi-1.5 to
67-
promote further research on these urgent topics.*
68-
69-
This model was contributed by [Susnato Dhar](https://huggingface.co/susnato).
70-
71-
The original code for Phi-1, Phi-1.5 and Phi-2 can be found [here](https://huggingface.co/microsoft/phi-1), [here](https://huggingface.co/microsoft/phi-1_5) and [here](https://huggingface.co/microsoft/phi-2), respectively.
72-
73-
## Usage tips
74-
75-
- This model is quite similar to `Llama` with the main difference in [`PhiDecoderLayer`], where they used [`PhiAttention`] and [`PhiMLP`] layers in parallel configuration.
76-
- The tokenizer used for this model is identical to the [`CodeGenTokenizer`].
77-
78-
## How to use Phi-2
79-
80-
<Tip warning={true}>
24+
# Phi
8125

82-
Phi-2 has been integrated in the development version (4.37.0.dev) of `transformers`. Until the official version is released through `pip`, ensure that you are doing one of the following:
26+
[Phi](https://huggingface.co/papers/2306.11644) is a 1.3B parameter transformer model optimized for Python code generation. It focuses on "textbook-quality" training data of code examples, exercises and synthetic Python problems rather than scaling the model size or compute.
8327

84-
* When loading the model, ensure that `trust_remote_code=True` is passed as an argument of the `from_pretrained()` function.
28+
You can find all the original Phi checkpoints under the [Phi-1](https://huggingface.co/collections/microsoft/phi-1-6626e29134744e94e222d572) collection.
8529

86-
* Update your local `transformers` to the development version: `pip uninstall -y transformers && pip install git+https://github.com/huggingface/transformers`. The previous command is an alternative to cloning and installing from the source.
30+
> [!TIP]
31+
> Click on the Phi models in the right sidebar for more examples of how to apply Phi to different language tasks.
8732
88-
</Tip>
33+
The example below demonstrates how to generate text with [`Pipeline`], [`AutoModel`] and from the command line.
8934

90-
```python
91-
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
35+
<hfoptions id="usage">
36+
<hfoption id="Pipeline">
9237

93-
>>> model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")
94-
>>> tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2")
38+
```py
39+
import torch
40+
from transformers import pipeline
9541

96-
>>> inputs = tokenizer('Can you help me write a formal email to a potential business partner proposing a joint venture?', return_tensors="pt", return_attention_mask=False)
42+
pipeline = pipeline(task="text-generation", model="microsoft/phi-1.5", device=0, torch_dtype=torch.bfloat16)
43+
pipeline("pipeline('''def print_prime(n): """ Print all primes between 1 and n"""''')")
9744
98-
>>> outputs = model.generate(**inputs, max_length=30)
99-
>>> text = tokenizer.batch_decode(outputs)[0]
100-
>>> print(text)
101-
Can you help me write a formal email to a potential business partner proposing a joint venture?
102-
Input: Company A: ABC Inc.
103-
Company B
10445
```
10546
106-
### Example :
107-
108-
```python
109-
>>> from transformers import PhiForCausalLM, AutoTokenizer
47+
</hfoption>
11048
111-
>>> # define the model and tokenizer.
112-
>>> model = PhiForCausalLM.from_pretrained("microsoft/phi-1_5")
113-
>>> tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1_5")
49+
<hfoption id="AutoModel">
11450
115-
>>> # feel free to change the prompt to your liking.
116-
>>> prompt = "If I were an AI that had just achieved"
51+
```py
52+
import torch
53+
from transformers import AutoTokenizer, AutoModelForCausalLM
11754
118-
>>> # apply the tokenizer.
119-
>>> tokens = tokenizer(prompt, return_tensors="pt")
55+
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1")
56+
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1", torch_dtype=torch.float16, device_map="auto", attn_implementation="sdpa")
12057
121-
>>> # use the model to generate new tokens.
122-
>>> generated_output = model.generate(**tokens, use_cache=True, max_new_tokens=10)
58+
input_ids = tokenizer('''def print_prime(n):
59+
"""
60+
Print all primes between 1 and n
61+
"""''', return_tensors="pt").to("cuda")
12362
124-
>>> tokenizer.batch_decode(generated_output)[0]
125-
'If I were an AI that had just achieved a breakthrough in machine learning, I would be thrilled'
63+
output = model.generate(**input_ids, cache_implementation="static")
64+
print(tokenizer.decode(output[0], skip_special_tokens=True))
12665
```
12766
128-
## Combining Phi and Flash Attention 2
129-
130-
First, make sure to install the latest version of Flash Attention 2 to include the sliding window attention feature.
67+
</hfoption>
68+
<hfoption id="transformers-cli">
13169
13270
```bash
133-
pip install -U flash-attn --no-build-isolation
71+
echo -e "'''def print_prime(n): """ Print all primes between 1 and n"""'''" | transformers-cli run --task text-classification --model microsoft/phi-1.5 --device 0
13472
```
13573
136-
Make also sure that you have a hardware that is compatible with Flash-Attention 2. Read more about it in the official documentation of flash-attn repository. Make also sure to load your model in half-precision (e.g. `torch.float16``)
74+
</hfoption>
75+
</hfoptions>
13776
138-
To load and run a model using Flash Attention 2, refer to the snippet below:
77+
Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends.
13978
140-
```python
141-
>>> import torch
142-
>>> from transformers import PhiForCausalLM, AutoTokenizer
79+
The example below uses [bitsandbytes](https://huggingface.co/docs/transformers/en/quantization/bitsandbytes) to only quantize the weights to 4-bits.
14380
144-
>>> # define the model and tokenizer and push the model and tokens to the GPU.
145-
>>> model = PhiForCausalLM.from_pretrained("microsoft/phi-1_5", torch_dtype=torch.float16, attn_implementation="flash_attention_2").to("cuda") # doctest: +SKIP
146-
>>> tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1_5")
81+
```py
82+
import torch
83+
from transformers import BitsAndBytesConfig, AutoTokenizer, AutoModelForCausalLM
14784
148-
>>> # feel free to change the prompt to your liking.
149-
>>> prompt = "If I were an AI that had just achieved"
85+
bnb_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True)
86+
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1")
87+
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1", torch_dtype=torch.float16, device_map="auto", attn_implementation="sdpa", quantization_config=bnb_config)
15088
151-
>>> # apply the tokenizer.
152-
>>> tokens = tokenizer(prompt, return_tensors="pt").to("cuda")
89+
input_ids = tokenizer('''def print_prime(n):
90+
"""
91+
Print all primes between 1 and n
92+
"""''', return_tensors="pt").to("cuda")
15393
154-
>>> # use the model to generate new tokens.
155-
>>> generated_output = model.generate(**tokens, use_cache=True, max_new_tokens=10) # doctest: +SKIP
156-
157-
>>> tokenizer.batch_decode(generated_output)[0] # doctest: +SKIP
158-
'If I were an AI that had just achieved a breakthrough in machine learning, I would be thrilled'
94+
output = model.generate(**input_ids, cache_implementation="static")
95+
print(tokenizer.decode(output[0], skip_special_tokens=True))
15996
```
16097
161-
### Expected speedups
162-
163-
Below is an expected speedup diagram that compares pure inference time between the native implementation in transformers using `microsoft/phi-1` checkpoint and the Flash Attention 2 version of the model using a sequence length of 2048.
164-
165-
<div style="text-align: center">
166-
<img src="https://huggingface.co/datasets/ybelkada/documentation-images/resolve/main/phi_1_speedup_plot.jpg">
167-
</div>
98+
## Notes
99+
100+
- If you're using Transformers < 4.37.0.dev, set `trust_remote_code=True` in [`~AutoModel.from_pretrained`]. Otherwise, make sure you update Transformers to the latest stable version.
101+
102+
```py
103+
import torch
104+
from transformers import AutoTokenizer, AutoModelForCausalLM
105+
106+
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1")
107+
model = AutoModelForCausalLM.from_pretrained(
108+
"microsoft/phi-1",
109+
torch_dtype=torch.float16,
110+
device_map="auto",
111+
trust_remote_code=True,
112+
attn_implementation="sdpa")
113+
114+
input_ids = tokenizer('''def print_prime(n):
115+
"""
116+
Print all primes between 1 and n
117+
"""''', return_tensors="pt").to("cuda")
118+
119+
output = model.generate(**input_ids, cache_implementation="static")
120+
print(tokenizer.decode(output[0], skip_special_tokens=True))
121+
```
168122
169123
## PhiConfig
170124
171125
[[autodoc]] PhiConfig
172126
173-
<frameworkcontent>
174-
<pt>
175-
176127
## PhiModel
177128
178129
[[autodoc]] PhiModel
@@ -193,6 +144,3 @@ Below is an expected speedup diagram that compares pure inference time between t
193144
194145
[[autodoc]] PhiForTokenClassification
195146
- forward
196-
197-
</pt>
198-
</frameworkcontent>

0 commit comments

Comments
 (0)