Added refitting acceleration #2983

cehongwang · 2024-07-03T22:47:27Z

Added refit acceleration to existing refit pipeline.

During the first time of compilation, the interpreter will cache the weight name mapping between weights in TRT engine and weights in state_dict. The compiler then will do a tentative refit to test whether fast refit is success or not. If not, the caching will be removed. Later on, during refitting, if this mapping cache is detected, the re-interpretation of the module is skipped.

If the fast refit fails, the refitter falls back to the regular refit, which re-interprets the module and does refitting accordingly.

Checklist:

[ x] My code follows the style guidelines of this project (You can use the linters)
[ x] I have performed a self-review of my own code
[x ] I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
[ x] I have added the relevant labels to my PR in so that relevant reviewers are notified

tests/py/dynamo/models/test_model_refit.py

zewenli98

LGTM

py/torch_tensorrt/dynamo/_refit.py

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/conversion/_conversion.py	2024-08-08 20:53:00.452273+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/conversion/_conversion.py	2024-08-08 20:54:40.434855+00:00
@@ -167,7 +167,7 @@
        serialized_engine=interpreter_result.serialized_engine,
        input_binding_names=list(interpreter_result.input_names),
        output_binding_names=list(interpreter_result.output_names),
        name=name,
        settings=settings,
-        weight_name_map = weight_name_map
+        weight_name_map=weight_name_map,
    )
--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py	2024-08-08 20:53:00.452273+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py	2024-08-08 20:54:40.911400+00:00
@@ -502,11 +502,13 @@

        with io.BytesIO() as engine_bytes:
            engine_bytes.write(serialized_engine)
            engine_str = engine_bytes.getvalue()

-        return TRTInterpreterResult(engine_str, self._input_names, self._output_names, self.weight_name_map)
+        return TRTInterpreterResult(
+            engine_str, self._input_names, self._output_names, self.weight_name_map
+        )

    def run_node(self, n: torch.fx.Node) -> torch.fx.Node:
        self._cur_node_name = get_node_name(n)
        self._cur_node = n
        # add "_itensor_to_tensor_meta"
--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/runtime/_TorchTensorRTModule.py	2024-08-08 20:53:00.456273+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/runtime/_TorchTensorRTModule.py	2024-08-08 20:54:41.476969+00:00
@@ -143,12 +143,11 @@
                TorchTensorRTModule._pack_binding_names(self.output_binding_names),
                str(int(self.hardware_compatible)),
                self.encode_metadata(metadata),
            ]
        )
-        
-        
+
    def encode_metadata(self, metadata: Any) -> str:
        metadata = copy.deepcopy(metadata)
        metadata["settings"].torch_executed_ops = {
            f"torch.ops.{op.__str__()}"
            for op in metadata["settings"].torch_executed_ops

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py	2024-08-08 20:59:52.444408+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py	2024-08-08 21:01:37.564015+00:00
@@ -502,11 +502,13 @@

        with io.BytesIO() as engine_bytes:
            engine_bytes.write(serialized_engine)
            engine_str = engine_bytes.getvalue()

-        return TRTInterpreterResult(engine_str, self._input_names, self._output_names, self.weight_name_map)
+        return TRTInterpreterResult(
+            engine_str, self._input_names, self._output_names, self.weight_name_map
+        )

    def run_node(self, n: torch.fx.Node) -> torch.fx.Node:
        self._cur_node_name = get_node_name(n)
        self._cur_node = n
        # add "_itensor_to_tensor_meta"
--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/runtime/_TorchTensorRTModule.py	2024-08-08 20:59:52.452408+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/runtime/_TorchTensorRTModule.py	2024-08-08 21:01:38.143764+00:00
@@ -143,12 +143,11 @@
                TorchTensorRTModule._pack_binding_names(self.output_binding_names),
                str(int(self.hardware_compatible)),
                self.encode_metadata(metadata),
            ]
        )
-        
-        
+
    def encode_metadata(self, metadata: Any) -> str:
        metadata = copy.deepcopy(metadata)
        metadata["settings"].torch_executed_ops = {
            f"torch.ops.{op.__str__()}"
            for op in metadata["settings"].torch_executed_ops

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py	2024-08-08 21:05:58.675792+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py	2024-08-08 21:09:37.161892+00:00
@@ -502,11 +502,13 @@

        with io.BytesIO() as engine_bytes:
            engine_bytes.write(serialized_engine)
            engine_str = engine_bytes.getvalue()

-        return TRTInterpreterResult(engine_str, self._input_names, self._output_names, self.weight_name_map)
+        return TRTInterpreterResult(
+            engine_str, self._input_names, self._output_names, self.weight_name_map
+        )

    def run_node(self, n: torch.fx.Node) -> torch.fx.Node:
        self._cur_node_name = get_node_name(n)
        self._cur_node = n
        # add "_itensor_to_tensor_meta"
--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/runtime/_TorchTensorRTModule.py	2024-08-08 21:05:58.679792+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/runtime/_TorchTensorRTModule.py	2024-08-08 21:09:37.741137+00:00
@@ -143,12 +143,11 @@
                TorchTensorRTModule._pack_binding_names(self.output_binding_names),
                str(int(self.hardware_compatible)),
                self.encode_metadata(metadata),
            ]
        )
-        
-        
+
    def encode_metadata(self, metadata: Any) -> str:
        metadata = copy.deepcopy(metadata)
        metadata["settings"].torch_executed_ops = {
            f"torch.ops.{op.__str__()}"
            for op in metadata["settings"].torch_executed_ops

tests/py/dynamo/models/test_model_refit.py

narendasan

LGTM

zewenli98 · 2024-08-14T07:57:47Z

py/torch_tensorrt/dynamo/conversion/_conversion.py

+                weight_name_map=interpreter_result.weight_name_map,
+            )
+        except AssertionError:
+            logger.warning("Fast refit test failed. Removing the weight map caching.")


Where's the operation that you remove the weight map caching?

zewenli98 · 2024-08-14T07:58:34Z

py/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py

+        """
+
+        def find_weight(
+            weight_name: str, np_map: dict[str, Any], sd: dict[str, Any]


What does np_map mean?

cehongwang requested review from narendasan and peri044 July 3, 2024 22:47

facebook-github-bot added the cla signed label Jul 3, 2024

github-actions bot added component: conversion Issues re: Conversion stage component: api [Python] Issues re: Python API component: runtime component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: tests Issues re: Tests labels Jul 3, 2024

narendasan reviewed Jul 3, 2024

View reviewed changes

tests/py/dynamo/models/test_model_refit.py Show resolved Hide resolved

narendasan requested a review from zewenli98 July 8, 2024 17:14

zewenli98 approved these changes Jul 23, 2024

View reviewed changes