Some questions about the training process #300

Anyon123 · 2025-05-22T03:35:42Z

Anyon123
May 22, 2025

Thank you for your outstanding work. I have some questions about the training process of the Bolt model:

When training the Bolt model based on the chronos_datasets with a learning rate of 1e-5 and a batch size of 32, the loss always fluctuates and does not show a downward trend. Are there any techniques to address this?
During the training process, if there are instances where the context is completely identical but the targets are slightly different, loss explosions occur in the instance normalization step because the scale becomes too small (e.g., less than 1e-8).
While retaining the original structure of the Bolt model, if I add some new layers (e.g., an input_embedding_layer), after training, the model completely fails to capture any periodic patterns. Have others encountered a similar situation? Is there any trick to avoid it?