Open
Description
I want to speed up SAM on A100 as much as possible, but on a serverless GPU architecture.
I can load the model in a cold start and run it during the job.
I made this change:
# from segment_anything import SamPredictor, sam_model_registry, SamAutomaticMaskGenerator
from segment_anything_fast import (
SamPredictor,
sam_model_fast_registry,
SamAutomaticMaskGenerator,
)
sam = sam_model_fast_registry["vit_h"](checkpoint="models/sam_vit_h_4b8939.pth")
device = "cuda" if torch.cuda.is_available() else "cpu"
sam.to(device)
predictor = SamPredictor(sam)
mask_generator = SamAutomaticMaskGenerator(sam)
(I have predictor and mask_generator because I use SAM in different ways (everything and with a point)).
I am wondering if I am missing something as my job is taking extremely long. I feel like the autotune is running, I am not sure I understand what it is doing, but can't I use pre-optimized functions for A100?
Here are the logs:
2025-01-18 23:13:41.285 | info | kelcmu34r52tw4 | SingleProcess AUTOTUNE benchmarking takes 11.5340 seconds and 0.0000 seconds precompiling\n
2025-01-18 23:13:41.285 | info | kelcmu34r52tw4 | triton_mm_773 0.2693 ms 70.0%\n
2025-01-18 23:13:41.285 | info | kelcmu34r52tw4 | addmm 0.2693 ms 70.0%\n
2025-01-18 23:13:41.285 | info | kelcmu34r52tw4 | triton_mm_775 0.2673 ms 70.5%\n
2025-01-18 23:13:41.285 | info | kelcmu34r52tw4 | triton_mm_776 0.2401 ms 78.5%\n
2025-01-18 23:13:41.285 | info | kelcmu34r52tw4 | triton_mm_772 0.2365 ms 79.7%\n
2025-01-18 23:13:41.285 | info | kelcmu34r52tw4 | triton_mm_774 0.2263 ms 83.3%\n
2025-01-18 23:13:41.285 | info | kelcmu34r52tw4 | triton_mm_781 0.2150 ms 87.6%\n
2025-01-18 23:13:41.285 | info | kelcmu34r52tw4 | triton_mm_779 0.1925 ms 97.9%\n
2025-01-18 23:13:41.285 | info | kelcmu34r52tw4 | triton_mm_780 0.1905 ms 98.9%\n
2025-01-18 23:13:41.285 | info | kelcmu34r52tw4 | bias_addmm 0.1884 ms 100.0%\n
2025-01-18 23:13:38.032 | info | kelcmu34r52tw4 | AUTOTUNE addmm(4096x3840, 4096x1280, 1280x3840)\n
2025-01-18 23:13:38.032 | info | kelcmu34r52tw4 | SingleProcess AUTOTUNE benchmarking takes 10.2984 seconds and 0.0000 seconds precompiling\n
2025-01-18 23:13:38.032 | info | kelcmu34r52tw4 | triton_mm_72 0.1137 ms 69.4%\n
2025-01-18 23:13:38.032 | info | kelcmu34r52tw4 | triton_mm_68 0.1106 ms 71.3%\n
2025-01-18 23:13:38.032 | info | kelcmu34r52tw4 | triton_mm_70 0.1085 ms 72.6%\n
2025-01-18 23:13:38.031 | info | kelcmu34r52tw4 | triton_mm_67 0.0993 ms 79.4%\n
2025-01-18 23:13:38.031 | info | kelcmu34r52tw4 | triton_mm_71 0.0983 ms 80.2%\n
2025-01-18 23:13:38.031 | info | kelcmu34r52tw4 | triton_mm_69 0.0932 ms 84.6%\n
2025-01-18 23:13:38.031 | info | kelcmu34r52tw4 | triton_mm_76 0.0881 ms 89.5%\n
2025-01-18 23:13:38.031 | info | kelcmu34r52tw4 | triton_mm_74 0.0829 ms 95.1%\n
2025-01-18 23:13:38.031 | info | kelcmu34r52tw4 | triton_mm_75 0.0809 ms 97.5%\n
2025-01-18 23:13:38.031 | info | kelcmu34r52tw4 | mm 0.0788 ms 100.0%\n
2025-01-18 23:13:26.480 | info | kelcmu34r52tw4 | AUTOTUNE mm(4900x1280, 1280x1280)\n
2025-01-18 23:13:26.480 | info | kelcmu34r52tw4 | SingleProcess AUTOTUNE benchmarking takes 5.2299 seconds and 0.0000 seconds precompiling\n
2025-01-18 23:13:26.480 | info | kelcmu34r52tw4 | triton_bmm_49 0.0225 ms 95.5%\n
2025-01-18 23:13:26.480 | info | kelcmu34r52tw4 | triton_bmm_47 0.0225 ms 95.5%\n
2025-01-18 23:13:26.480 | info | kelcmu34r52tw4 | triton_bmm_46 0.0225 ms 95.5%\n
2025-01-18 23:13:26.480 | info | kelcmu34r52tw4 | triton_bmm_45 0.0225 ms 95.5%\n
2025-01-18 23:13:26.480 | info | kelcmu34r52tw4 | triton_bmm_44 0.0225 ms 95.5%\n
2025-01-18 23:13:26.480 | info | kelcmu34r52tw4 | triton_bmm_42 0.0225 ms 95.5%\n
2025-01-18 23:13:26.480 | info | kelcmu34r52tw4 | triton_bmm_54 0.0215 ms 100.0%\n
2025-01-18 23:13:26.480 | info | kelcmu34r52tw4 | triton_bmm_51 0.0215 ms 100.0%\n
2025-01-18 23:13:26.480 | info | kelcmu34r52tw4 | triton_bmm_50 0.0215 ms 100.0%\n
2025-01-18 23:13:26.480 | info | kelcmu34r52tw4 | triton_bmm_48 0.0215 ms 100.0%\n
2025-01-18 23:13:16.181 | info | kelcmu34r52tw4 | AUTOTUNE bmm(14x5600x80, 14x80x14)\n
2025-01-18 23:13:16.181 | info | kelcmu34r52tw4 | SingleProcess AUTOTUNE benchmarking takes 5.4020 seconds and 0.0000 seconds precompiling\n
2025-01-18 23:13:16.181 | info | kelcmu34r52tw4 | triton_bmm_37 0.0215 ms 95.2%\n
2025-01-18 23:13:16.181 | info | kelcmu34r52tw4 | triton_bmm_36 0.0215 ms 95.2%\n
2025-01-18 23:13:16.181 | info | kelcmu34r52tw4 | triton_bmm_34 0.0215 ms 95.2%\n
2025-01-18 23:13:16.181 | info | kelcmu34r52tw4 | triton_bmm_33 0.0215 ms 95.2%\n
2025-01-18 23:13:16.181 | info | kelcmu34r52tw4 | triton_bmm_32 0.0215 ms 95.2%\n
2025-01-18 23:13:16.181 | info | kelcmu34r52tw4 | triton_bmm_30 0.0215 ms 95.2%\n
2025-01-18 23:13:16.181 | info | kelcmu34r52tw4 | triton_bmm_29 0.0215 ms 95.2%\n
2025-01-18 23:13:16.181 | info | kelcmu34r52tw4 | triton_bmm_26 0.0215 ms 95.2%\n
2025-01-18 23:13:16.181 | info | kelcmu34r52tw4 | triton_bmm_35 0.0205 ms 100.0%\n
2025-01-18 23:13:16.181 | info | kelcmu34r52tw4 | triton_bmm_31 0.0205 ms 100.0%\n
2025-01-18 23:13:10.951 | info | kelcmu34r52tw4 | AUTOTUNE bmm(14x5600x80, 14x80x14)\n
2025-01-18 23:13:10.951 | info | kelcmu34r52tw4 | SingleProcess AUTOTUNE benchmarking takes 10.3003 seconds and 0.0000 seconds precompiling\n
2025-01-18 23:13:10.951 | info | kelcmu34r52tw4 | triton_mm_21 0.3236 ms 69.6%\n
2025-01-18 23:13:10.951 | info | kelcmu34r52tw4 | triton_mm_17 0.3092 ms 72.8%\n
2025-01-18 23:13:10.951 | info | kelcmu34r52tw4 | triton_mm_19 0.3062 ms 73.6%\n
2025-01-18 23:13:10.951 | info | kelcmu34r52tw4 | triton_mm_20 0.2857 ms 78.9%\n
2025-01-18 23:13:10.951 | info | kelcmu34r52tw4 | triton_mm_16 0.2775 ms 81.2%\n
2025-01-18 23:13:10.951 | info | kelcmu34r52tw4 | triton_mm_18 0.2652 ms 84.9%\n
2025-01-18 23:13:10.951 | info | kelcmu34r52tw4 | triton_mm_25 0.2540 ms 88.7%\n
2025-01-18 23:13:10.951 | info | kelcmu34r52tw4 | mm 0.2386 ms 94.4%\n
2025-01-18 23:13:10.951 | info | kelcmu34r52tw4 | triton_mm_24 0.2263 ms 99.5%\n
2025-01-18 23:13:10.951 | info | kelcmu34r52tw4 | triton_mm_23 0.2253 ms 100.0%\n
2025-01-18 23:13:05.549 | info | kelcmu34r52tw4 | AUTOTUNE mm(4900x1280, 1280x3840)\n
2025-01-18 23:13:05.549 | info | kelcmu34r52tw4 | SingleProcess AUTOTUNE benchmarking takes 3.3487 seconds and 0.0000 seconds precompiling\n
2025-01-18 23:13:05.549 | info | kelcmu34r52tw4 | triton_convolution_2 3.2420 ms 14.1%\n
2025-01-18 23:13:05.549 | info | kelcmu34r52tw4 | triton_convolution_0 1.1105 ms 41.2%\n
2025-01-18 23:13:05.549 | info | kelcmu34r52tw4 | triton_convolution_4 0.8735 ms 52.4%\n
2025-01-18 23:13:05.549 | info | kelcmu34r52tw4 | triton_convolution_5 0.8172 ms 56.0%\n
2025-01-18 23:13:05.549 | info | kelcmu34r52tw4 | convolution 0.7444 ms 61.5%\n
2025-01-18 23:13:05.549 | info | kelcmu34r52tw4 | triton_convolution_3 0.5960 ms 76.8%\n
2025-01-18 23:13:05.549 | info | kelcmu34r52tw4 | triton_convolution_1 0.5591 ms 81.9%\n
2025-01-18 23:13:05.549 | info | kelcmu34r52tw4 | triton_convolution_6 0.4577 ms 100.0%\n
2025-01-18 23:12:55.248 | info | kelcmu34r52tw4 | AUTOTUNE convolution(1x3x1024x1024, 1280x3x16x16)\n
2025-01-18 23:12:55.248 | info | kelcmu34r52tw4 | SingleProcess AUTOTUNE benchmarking takes 10.0648 seconds and 0.0000 seconds precompiling\n
2025-01-18 23:12:55.248 | info | kelcmu34r52tw4 | triton_mm_3470 0.3543 ms 67.1%\n
2025-01-18 23:12:55.248 | info | kelcmu34r52tw4 | triton_mm_3474 0.3512 ms 67.6%\n
2025-01-18 23:12:55.248 | info | kelcmu34r52tw4 | triton_mm_3473 0.3072 ms 77.3%\n
2025-01-18 23:12:55.248 | info | kelcmu34r52tw4 | triton_mm_3469 0.3052 ms 77.9%\n
2025-01-18 23:12:55.248 | info | kelcmu34r52tw4 | triton_mm_3472 0.3041 ms 78.1%\n
2025-01-18 23:12:55.248 | info | kelcmu34r52tw4 | triton_mm_3471 0.2939 ms 80.8%\n
2025-01-18 23:12:55.248 | info | kelcmu34r52tw4 | mm 0.2760 ms 86.1%\n
2025-01-18 23:12:55.248 | info | kelcmu34r52tw4 | triton_mm_3476 0.2550 ms 93.2%\n
2025-01-18 23:12:55.248 | info | kelcmu34r52tw4 | triton_mm_3478 0.2499 ms 95.1%\n
2025-01-18 23:12:55.248 | info | kelcmu34r52tw4 | triton_mm_3477 0.2376 ms 100.0%\n
2025-01-18 23:12:48.030 | info | kelcmu34r52tw4 | AUTOTUNE mm(4096x5120, 5120x1280)\n
2025-01-18 23:12:48.030 | info | kelcmu34r52tw4 | SingleProcess AUTOTUNE benchmarking takes 10.3774 seconds and 0.0000 seconds precompiling\n
2025-01-18 23:12:48.030 | info | kelcmu34r52tw4 | triton_mm_91 0.3564 ms 68.4%\n
2025-01-18 23:12:48.030 | info | kelcmu34r52tw4 | triton_mm_87 0.3471 ms 70.2%\n
2025-01-18 23:12:48.030 | info | kelcmu34r52tw4 | triton_mm_89 0.3451 ms 70.6%\n
2025-01-18 23:12:48.030 | info | kelcmu34r52tw4 | triton_mm_90 0.3164 ms 77.0%\n
2025-01-18 23:12:48.030 | info | kelcmu34r52tw4 | triton_mm_86 0.3144 ms 77.5%\n
2025-01-18 23:12:48.030 | info | kelcmu34r52tw4 | triton_mm_88 0.3000 ms 81.2%\n
2025-01-18 23:12:48.030 | info | kelcmu34r52tw4 | triton_mm_95 0.2785 ms 87.5%\n
2025-01-18 23:12:48.030 | info | kelcmu34r52tw4 | triton_mm_93 0.2550 ms 95.6%\n
2025-01-18 23:12:48.030 | info | kelcmu34r52tw4 | triton_mm_94 0.2499 ms 97.5%\n
2025-01-18 23:12:48.029 | info | kelcmu34r52tw4 | mm 0.2437 ms 100.0%\n
2025-01-18 23:12:35.464 | info | kelcmu34r52tw4 | AUTOTUNE mm(4096x1280, 1280x5120)\n
2025-01-18 23:11:37.594 | info | kelcmu34r52tw4 | Started.
2025-01-18 23:11:37.335 | info | kelcmu34r52tw4 | --- Starting Serverless Worker | Version 1.6.2 ---\n
2025-01-18 22:59:33.946 | info | kelcmu34r52tw4 | SingleProcess AUTOTUNE benchmarking takes 6.8672 seconds and 0.0000 seconds precompiling\n
2025-01-18 22:59:33.946 | info | kelcmu34r52tw4 | triton_bmm_790 0.0256 ms 84.0%\n
2025-01-18 22:59:33.946 | info | kelcmu34r52tw4 | triton_bmm_788 0.0246 ms 87.5%\n
2025-01-18 22:59:33.946 | info | kelcmu34r52tw4 | triton_bmm_787 0.0246 ms 87.5%\n
2025-01-18 22:59:33.946 | info | kelcmu34r52tw4 | triton_bmm_795 0.0236 ms 91.3%\n
2025-01-18 22:59:33.946 | info | kelcmu34r52tw4 | triton_bmm_792 0.0236 ms 91.3%\n
2025-01-18 22:59:33.946 | info | kelcmu34r52tw4 | triton_bmm_789 0.0236 ms 91.3%\n
2025-01-18 22:59:33.946 | info | kelcmu34r52tw4 | triton_bmm_798 0.0225 ms 95.5%\n
2025-01-18 22:59:33.946 | info | kelcmu34r52tw4 | triton_bmm_796 0.0225 ms 95.5%\n
2025-01-18 22:59:33.946 | info | kelcmu34r52tw4 | triton_bmm_793 0.0225 ms 95.5%\n
2025-01-18 22:59:33.946 | info | kelcmu34r52tw4 | triton_bmm_791 0.0215 ms 100.0%\n
2025-01-18 22:59:32.082 | info | kelcmu34r52tw4 | AUTOTUNE bmm(64x1024x80, 64x80x64)\n
2025-01-18 22:59:32.082 | info | kelcmu34r52tw4 | SingleProcess AUTOTUNE benchmarking takes 11.1037 seconds and 0.0000 seconds precompiling\n
2025-01-18 22:59:32.082 | info | kelcmu34r52tw4 | triton_mm_775 0.2632 ms 71.2%\n
2025-01-18 22:59:32.082 | info | kelcmu34r52tw4 | triton_mm_773 0.2611 ms 71.8%\n
2025-01-18 22:59:32.082 | info | kelcmu34r52tw4 | addmm 0.2611 ms 71.8%\n
2025-01-18 22:59:32.082 | info | kelcmu34r52tw4 | triton_mm_776 0.2376 ms 78.9%\n
2025-01-18 22:59:32.082 | info | kelcmu34r52tw4 | triton_mm_772 0.2335 ms 80.3%\n
2025-01-18 22:59:32.082 | info | kelcmu34r52tw4 | triton_mm_774 0.2212 ms 84.7%\n
2025-01-18 22:59:32.082 | info | kelcmu34r52tw4 | triton_mm_781 0.2099 ms 89.3%\n
2025-01-18 22:59:32.082 | info | kelcmu34r52tw4 | triton_mm_779 0.1905 ms 98.4%\n
2025-01-18 22:59:32.082 | info | kelcmu34r52tw4 | triton_mm_780 0.1894 ms 98.9%\n
2025-01-18 22:59:32.082 | info | kelcmu34r52tw4 | bias_addmm 0.1874 ms 100.0%\n
2025-01-18 22:59:25.214 | info | kelcmu34r52tw4 | AUTOTUNE addmm(4096x3840, 4096x1280, 1280x3840)\n
2025-01-18 22:59:25.214 | info | kelcmu34r52tw4 | SingleProcess AUTOTUNE benchmarking takes 10.0688 seconds and 0.0000 seconds precompiling\n
2025-01-18 22:59:25.214 | info | kelcmu34r52tw4 | triton_mm_72 0.1126 ms 70.0%\n
2025-01-18 22:59:25.214 | info | kelcmu34r52tw4 | triton_mm_68 0.1096 ms 72.0%\n
2025-01-18 22:59:25.214 | info | kelcmu34r52tw4 | triton_mm_70 0.1085 ms 72.6%\n
2025-01-18 22:59:25.214 | info | kelcmu34r52tw4 | triton_mm_67 0.0973 ms 81.1%\n
2025-01-18 22:59:25.214 | info | kelcmu34r52tw4 | triton_mm_71 0.0963 ms 81.9%\n
2025-01-18 22:59:25.214 | info | kelcmu34r52tw4 | triton_mm_69 0.0901 ms 87.5%\n
2025-01-18 22:59:25.214 | info | kelcmu34r52tw4 | triton_mm_76 0.0881 ms 89.5%\n
2025-01-18 22:59:25.214 | info | kelcmu34r52tw4 | triton_mm_74 0.0819 ms 96.2%\n
2025-01-18 22:59:25.214 | info | kelcmu34r52tw4 | triton_mm_75 0.0799 ms 98.7%\n
2025-01-18 22:59:25.214 | info | kelcmu34r52tw4 | mm 0.0788 ms 100.0%\n
2025-01-18 22:59:14.096 | info | kelcmu34r52tw4 | AUTOTUNE mm(4900x1280, 1280x1280)\n
2025-01-18 22:59:14.095 | info | kelcmu34r52tw4 | SingleProcess AUTOTUNE benchmarking takes 5.2848 seconds and 0.0000 seconds precompiling\n
2025-01-18 22:59:14.095 | info | kelcmu34r52tw4 | triton_bmm_45 0.0225 ms 90.9%\n
2025-01-18 22:59:14.095 | info | kelcmu34r52tw4 | triton_bmm_55 0.0215 ms 95.2%\n
2025-01-18 22:59:14.095 | info | kelcmu34r52tw4 | triton_bmm_54 0.0215 ms 95.2%\n
2025-01-18 22:59:14.095 | info | kelcmu34r52tw4 | triton_bmm_53 0.0215 ms 95.2%\n
2025-01-18 22:59:14.095 | info | kelcmu34r52tw4 | triton_bmm_48 0.0215 ms 95.2%\n
2025-01-18 22:59:14.095 | info | kelcmu34r52tw4 | triton_bmm_47 0.0215 ms 95.2%\n
2025-01-18 22:59:14.095 | info | kelcmu34r52tw4 | triton_bmm_44 0.0215 ms 95.2%\n
2025-01-18 22:59:14.095 | info | kelcmu34r52tw4 | triton_bmm_42 0.0215 ms 95.2%\n
2025-01-18 22:59:14.095 | info | kelcmu34r52tw4 | triton_bmm_51 0.0205 ms 100.0%\n
2025-01-18 22:59:14.095 | info | kelcmu34r52tw4 | triton_bmm_50 0.0205 ms 100.0%\n
2025-01-18 22:59:04.027 | info | kelcmu34r52tw4 | AUTOTUNE bmm(14x5600x80, 14x80x14)\n
2025-01-18 22:59:04.027 | info | kelcmu34r52tw4 | SingleProcess AUTOTUNE benchmarking takes 5.3798 seconds and 0.0000 seconds precompiling\n
2025-01-18 22:59:04.027 | info | kelcmu34r52tw4 | triton_bmm_26 0.0215 ms 95.2%\n
2025-01-18 22:59:04.027 | info | kelcmu34r52tw4 | triton_bmm_40 0.0205 ms 100.0%\n
2025-01-18 22:59:04.027 | info | kelcmu34r52tw4 | triton_bmm_39 0.0205 ms 100.0%\n
2025-01-18 22:59:04.027 | info | kelcmu34r52tw4 | triton_bmm_38 0.0205 ms 100.0%\n
2025-01-18 22:59:04.027 | info | kelcmu34r52tw4 | triton_bmm_35 0.0205 ms 100.0%\n
2025-01-18 22:59:04.026 | info | kelcmu34r52tw4 | triton_bmm_34 0.0205 ms 100.0%\n
2025-01-18 22:59:04.026 | info | kelcmu34r52tw4 | triton_bmm_33 0.0205 ms 100.0%\n
2025-01-18 22:59:04.026 | info | kelcmu34r52tw4 | triton_bmm_32 0.0205 ms 100.0%\n
2025-01-18 22:59:04.026 | info | kelcmu34r52tw4 | triton_bmm_31 0.0205 ms 100.0%\n
2025-01-18 22:59:04.026 | info | kelcmu34r52tw4 | triton_bmm_29 0.0205 ms 100.0%\n
2025-01-18 22:58:58.742 | info | kelcmu34r52tw4 | AUTOTUNE bmm(14x5600x80, 14x80x14)\n
2025-01-18 22:58:58.742 | info | kelcmu34r52tw4 | SingleProcess AUTOTUNE benchmarking takes 9.8710 seconds and 0.0000 seconds precompiling\n
2025-01-18 22:58:58.742 | info | kelcmu34r52tw4 | triton_mm_21 0.3185 ms 69.8%\n
2025-01-18 22:58:58.741 | info | kelcmu34r52tw4 | triton_mm_17 0.3041 ms 73.1%\n
2025-01-18 22:58:58.741 | info | kelcmu34r52tw4 | triton_mm_19 0.3000 ms 74.1%\n
2025-01-18 22:58:58.741 | info | kelcmu34r52tw4 | triton_mm_20 0.2816 ms 78.9%\n
2025-01-18 22:58:58.741 | info | kelcmu34r52tw4 | triton_mm_16 0.2744 ms 81.0%\n
2025-01-18 22:58:58.741 | info | kelcmu34r52tw4 | triton_mm_18 0.2652 ms 83.8%\n
2025-01-18 22:58:58.741 | info | kelcmu34r52tw4 | triton_mm_25 0.2478 ms 89.7%\n
2025-01-18 22:58:58.741 | info | kelcmu34r52tw4 | mm 0.2355 ms 94.3%\n
2025-01-18 22:58:58.741 | info | kelcmu34r52tw4 | triton_mm_23 0.2232 ms 99.5%\n
2025-01-18 22:58:58.741 | info | kelcmu34r52tw4 | triton_mm_24 0.2222 ms 100.0%\n
2025-01-18 22:58:53.361 | info | kelcmu34r52tw4 | AUTOTUNE mm(4900x1280, 1280x3840)\n
2025-01-18 22:58:53.361 | info | kelcmu34r52tw4 | SingleProcess AUTOTUNE benchmarking takes 3.2299 seconds and 0.0000 seconds precompiling\n
2025-01-18 22:58:53.361 | info | kelcmu34r52tw4 | triton_convolution_2 3.2338 ms 14.2%\n
2025-01-18 22:58:53.361 | info | kelcmu34r52tw4 | triton_convolution_0 1.1110 ms 41.4%\n
2025-01-18 22:58:53.361 | info | kelcmu34r52tw4 | triton_convolution_4 0.8755 ms 52.5%\n
2025-01-18 22:58:53.361 | info | kelcmu34r52tw4 | triton_convolution_5 0.8192 ms 56.1%\n
2025-01-18 22:58:53.361 | info | kelcmu34r52tw4 | convolution 0.7485 ms 61.4%\n
2025-01-18 22:58:53.361 | info | kelcmu34r52tw4 | triton_convolution_3 0.5990 ms 76.8%\n
2025-01-18 22:58:53.361 | info | kelcmu34r52tw4 | triton_convolution_1 0.5591 ms 82.2%\n
2025-01-18 22:58:53.361 | info | kelcmu34r52tw4 | triton_convolution_6 0.4598 ms 100.0%\n
2025-01-18 22:58:43.490 | info | kelcmu34r52tw4 | AUTOTUNE convolution(1x3x1024x1024, 1280x3x16x16)\n
2025-01-18 22:58:43.490 | info | kelcmu34r52tw4 | SingleProcess AUTOTUNE benchmarking takes 10.0157 seconds and 0.0000 seconds precompiling\n
2025-01-18 22:58:43.490 | info | kelcmu34r52tw4 | triton_mm_3474 0.3482 ms 67.1%\n
2025-01-18 22:58:43.490 | info | kelcmu34r52tw4 | triton_mm_3470 0.3482 ms 67.1%\n
2025-01-18 22:58:43.490 | info | kelcmu34r52tw4 | triton_mm_3469 0.3031 ms 77.0%\n
2025-01-18 22:58:43.490 | info | kelcmu34r52tw4 | triton_mm_3473 0.3011 ms 77.6%\n
2025-01-18 22:58:43.490 | info | kelcmu34r52tw4 | triton_mm_3472 0.2970 ms 78.6%\n
2025-01-18 22:58:43.490 | info | kelcmu34r52tw4 | triton_mm_3471 0.2908 ms 80.3%\n
2025-01-18 22:58:43.490 | info | kelcmu34r52tw4 | mm 0.2703 ms 86.4%\n
2025-01-18 22:58:43.490 | info | kelcmu34r52tw4 | triton_mm_3476 0.2478 ms 94.2%\n
2025-01-18 22:58:43.490 | info | kelcmu34r52tw4 | triton_mm_3478 0.2468 ms 94.6%\n
2025-01-18 22:58:43.490 | info | kelcmu34r52tw4 | triton_mm_3477 0.2335 ms 100.0%\n
2025-01-18 22:58:36.778 | info | kelcmu34r52tw4 | AUTOTUNE mm(4096x5120, 5120x1280)\n
2025-01-18 22:58:36.778 | info | kelcmu34r52tw4 | SingleProcess AUTOTUNE benchmarking takes 9.9050 seconds and 0.0000 seconds precompiling\n
2025-01-18 22:58:36.778 | info | kelcmu34r52tw4 | triton_mm_91 0.3564 ms 67.8%\n
2025-01-18 22:58:36.778 | info | kelcmu34r52tw4 | triton_mm_87 0.3410 ms 70.9%\n
2025-01-18 22:58:36.778 | info | kelcmu34r52tw4 | triton_mm_89 0.3400 ms 71.1%\n
2025-01-18 22:58:36.778 | info | kelcmu34r52tw4 | triton_mm_90 0.3123 ms 77.4%\n
2025-01-18 22:58:36.778 | info | kelcmu34r52tw4 | triton_mm_86 0.3082 ms 78.4%\n
2025-01-18 22:58:36.778 | info | kelcmu34r52tw4 | triton_mm_88 0.2908 ms 83.1%\n
2025-01-18 22:58:36.778 | info | kelcmu34r52tw4 | triton_mm_95 0.2724 ms 88.7%\n
2025-01-18 22:58:36.778 | info | kelcmu34r52tw4 | triton_mm_93 0.2499 ms 96.7%\n
2025-01-18 22:58:36.778 | info | kelcmu34r52tw4 | triton_mm_94 0.2447 ms 98.7%\n
2025-01-18 22:58:36.777 | info | kelcmu34r52tw4 | mm 0.2417 ms 100.0%\n
2025-01-18 22:58:24.228 | info | kelcmu34r52tw4 | AUTOTUNE mm(4096x1280, 1280x5120)\n
2025-01-18 22:57:26.656 | info | kelcmu34r52tw4 | Started.
2025-01-18 22:57:26.357 | info | kelcmu34r52tw4 | --- Starting Serverless Worker | Version 1.6.2 ---\n
2025-01-18 22:57:26.357 | info | kelcmu34r52tw4 | \n
2025-01-18 22:57:26.357 | info | kelcmu34r52tw4 | A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.\n
2025-01-18 22:57:26.357 | info | kelcmu34r52tw4 | \n
2025-01-18 22:57:26.357 | info | kelcmu34r52tw4 | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license\n
2025-01-18 22:57:26.357 | info | kelcmu34r52tw4 | By pulling and using the container, you accept the terms and conditions of this license:\n
2025-01-18 22:57:26.357 | info | kelcmu34r52tw4 | This container image and its contents are governed by the NVIDIA Deep Learning Container License.\n
2025-01-18 22:57:26.357 | info | kelcmu34r52tw4 | \n
2025-01-18 22:57:26.357 | info | kelcmu34r52tw4 | Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.\n
2025-01-18 22:57:26.357 | info | kelcmu34r52tw4 | \n
2025-01-18 22:57:26.357 | info | kelcmu34r52tw4 | CUDA Version 11.8.0\n
2025-01-18 22:57:26.357 | info | kelcmu34r52tw4 | \n
2025-01-18 22:57:26.357 | info | kelcmu34r52tw4 | ==========\n
2025-01-18 22:57:26.357 | info | kelcmu34r52tw4 | == CUDA ==\n
2025-01-18 22:57:26.357 | info | kelcmu34r52tw4 | ==========\n
2025-01-18 22:57:26.356 | info | kelcmu34r52tw4 | \n
Metadata
Metadata
Assignees
Labels
No labels