Description
If I'm not mistaken, the conversation style that applies during a fine-tune is defined by the dataset defaults, rather than by the tokenizer being used (docs here.
What happens if the tokenizer+model do not have the tokens required for a given conversation style? Are those special tokens created? I assume not.
Is there an option whereby one can:
- default to using tokenizer.chat_template for the conversation style? (most models on huggingface have this defined)
I'm guessing one issue here is that - since tokenizer.chat_template is not known in advance, this poses issues for controlling the loss mask on the prompt vs completions?
So maybe that's the dilemna? Either one can:
a) load a default conversation style from the model/tokenizer, but then it's hard to implement loss masks, or
b) load the default conversation style based on the dataset choice, but then there risks being token incompatibilities with the model/tokenizer being trained.
The practical task I'm interested in is fine-tuning llama 3 and qwen 2.5 using conversation styles that match their chat templates (so as to minimise the re-training/over-writing that I'm doing).