Readme warmup update #1512

adobrzyn · 2025-07-02T07:03:52Z

No description provided.

Signed-off-by: Agata Dobrzyniewicz <[email protected]>

PatrykWo · 2025-07-02T07:33:26Z

README_GAUDI.md

@@ -198,7 +198,7 @@ Intel Gaudi accelerators perform best when operating on models with fixed tensor
 generates optimized binary code that implements the given model topology on Gaudi. In its default configuration, the produced binary code may be highly dependent on input and output tensor shapes, requiring graph recompilation
 when encountering tensors with different shapes within the same topology. While these binaries efficiently utilize Gaudi, the compilation process itself can introduce noticeable overhead in end-to-end execution.
 In dynamic inference serving scenarios, minimizing the number of graph compilations and reducing the risk of graph compilation occurring during server runtime is important. Currently, this is achieved by
-"bucketing" the model's forward pass across two dimensions: `batch_size` and `sequence_length`.
+"bucketing" the model's forward pass across two dimensions: `batch_size`, `query_length` and `num_ctx_blocks`.


Please explain what ctx stands for. Some explanation.

PatrykWo · 2025-07-02T07:34:37Z

README_GAUDI.md

-INFO 08-01 21:37:59 hpu_model_runner.py:499] Generated 24 prompt buckets: [(1, 128), (1, 256), (1, 384), (1, 512), (1, 640), (1, 768), (1, 896), (1, 1024), (2, 128), (2, 256), (2, 384), (2, 512), (2, 640), (2, 768), (2, 896), (2, 1024), (4, 128), (4, 256), (4, 384), (4, 512), (4, 640), (4, 768), (4, 896), (4, 1024)]
-INFO 08-01 21:37:59 hpu_model_runner.py:504] Decode bucket config (min, step, max_warmup) bs:[1, 128, 4], seq:[128, 128, 2048]
-INFO 08-01 21:37:59 hpu_model_runner.py:509] Generated 48 decode buckets: [(1, 128), (1, 256), (1, 384), (1, 512), (1, 640), (1, 768), (1, 896), (1, 1024), (1, 1152), (1, 1280), (1, 1408), (1, 1536), (1, 1664), (1, 1792), (1, 1920), (1, 2048), (2, 128), (2, 256), (2, 384), (2, 512), (2, 640), (2, 768), (2, 896), (2, 1024), (2, 1152), (2, 1280), (2, 1408), (2, 1536), (2, 1664), (2, 1792), (2, 1920), (2, 2048), (4, 128), (4, 256), (4, 384), (4, 512), (4, 640), (4, 768), (4, 896), (4, 1024), (4, 1152), (4, 1280), (4, 1408), (4, 1536), (4, 1664), (4, 1792), (4, 1920), (4, 2048)]
+INFO 07-01 16:36:17 [exponential.py:88] Prompt bucket config (min, step, max_warmup, limit) bs:[1, 1, 32, 6], seq:[128, 128, 2048, 12]


explain limit

Readme warmup update

eb078da

Signed-off-by: Agata Dobrzyniewicz <[email protected]>

adobrzyn requested review from kzawora-intel, madamczyk-intel, michalkuligowski, mgawarkiewicz-intel, vivekgoe, afierka-intel, xuechendi, jikunshang, mswiniarsk and PatrykWo as code owners July 2, 2025 07:03

PatrykWo reviewed Jul 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Readme warmup update #1512

Readme warmup update #1512

adobrzyn commented Jul 2, 2025

Uh oh!

PatrykWo Jul 2, 2025

Uh oh!

PatrykWo Jul 2, 2025

Uh oh!

Uh oh!

Readme warmup update #1512

Are you sure you want to change the base?

Readme warmup update #1512

Conversation

adobrzyn commented Jul 2, 2025

Uh oh!

PatrykWo Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

PatrykWo Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!