You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If more granular failure information (e.g. which client has failed) is accessible to the lighthouse, this could enable dynamic reconfiguration within Model Parallel groups (e.g. Oobleck, Varuna), extending FT beyond per-replica group granularity.
Uh oh!
There was an error while loading. Please reload this page.
It would be nice to be able to broadcast configs from lighthouse to all workers
We can add a new
config
endpoint to Lighthouse/Manager that workers can call to fetch itConfigs can be provided to lighthouse via
--lighthouse <path>.{json,yml}
The text was updated successfully, but these errors were encountered: