Regarding compute_metrics() using with HuggingFace Trainer #4220

anmolagarwal999 · 2023-08-25T14:19:38Z

anmolagarwal999
Aug 25, 2023

I am using DeepSpeed ZeroStage 3 and am passing a custom compute_metrics() to the trainer. I have 4 GPUs (devices). The compute_metrics function is being invoked by all the devices. Moreover, all the points (let’s say there N datapoints in the eval set) in the entire eval dataset seem to be sent to the compute_metrics of all the devices, which seems to be redundant and inefficient. Am I missing something here? (My expectation was that either (1) compute_metrics would be called only once OR (2) the evaluation dataset would be distributed across compute_metrics)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Regarding compute_metrics() using with HuggingFace Trainer #4220

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Regarding compute_metrics() using with HuggingFace Trainer #4220

Uh oh!

anmolagarwal999 Aug 25, 2023

Replies: 0 comments

anmolagarwal999
Aug 25, 2023