data format in mmichat_*.jsonl

Thank you for the great work!

Could you please share the data format or provide an example row from the mmichat_speech.jsonl file? 
In anygpt/src/train/stage2_sft.py, the preprocess function maps raw_datasets to tokenized_datasets. However, I'm a bit confused about how this processing works.

It would be very helpful if you could provide a short example or sample in JSONL format.