This repository was archived by the owner on Apr 24, 2025. It is now read-only.
This repository was archived by the owner on Apr 24, 2025. It is now read-only.
Enable graph mode for LLM inference #89
Open
Description
Hi,
I have read the "examples\NPU compilation tutorial.ipynb" about graph mode and eager mode, which helped me a lot.
I was wondering if I could use graph mode in LLM inference to reduce the weights copying between CPU and NPU.
So i simply changed the return value of function horizontal_fusion_linear
into return fx_model.to('npu')
, after converting the model, the inference error is:
AttributeError: 'Tensor' object has no attribute 'is_contiguous'
It seems this operation cannot be performed in the NPU?If i want to use graph mode in LLM inference, the above change is correct?
Any comment or advice is appreciated, thanks !
Metadata
Metadata
Assignees
Labels
No labels