Description
Feature request
Hello!
As already mentioned in other discussions python implementation has issues in serialization and deserialization performance(e.g. here). I focus on deserialization here.
I've tried some fixes on galactic
branch and they look very promising for me: depending on message structure and size it's about 3-20 times faster. x3 for simple messages, up to x20 for complex types with high nesting level and array fields.
Feature description
The main idea is to avoid pure python code in message deserialization. To achieve it, every message is split into C extension base class and pure python derived class. C extension class contains all message fields, so deserialization method initializes C structure members without calling python methods. On the other hand pure python class contains all dunder methods and assertions.
Implementation considerations
Here is an example of how generated files look like for simple message file Example.msg:
_example_base.c extension module defines base class containing members of native types in lines 10-25.
Python module _example.py is the same as in upstream except for base class(line 62) and microoptimization of assertion in __init__
method(line 95).
Support extension module _example_s.c now uses specific subtype of PyObject(line 147), calls __new__
instead of __init__
to prevent pure python calls(line 170), directly initializes C typed fields of message(lines 188, 191) and creates nested collections (line 193-212).
Benchmark for this message which runs deserialization routine 1 million times gives 28 seconds
vs 6,5 seconds
In case of more complex message like TestArrayComplex
(from this issue) with 100 nested TestElement
elements, the same benchmark gives 57,6 seconds
vs 5,5 seconds
for 50K deserializations.
I run benchmark with python3.8, ROS galactic, ubuntu 20.04, AMD 2,85GHz.
Pros:
- performance improvement
- public interface of message is not changed (as far as I can see)
Cons:
- binary dependency for basic message operations(creation, reading/writing fields).
- no more assertions while deserialization(I think it can be added)
- initialization logic is duplicated in
__init__
and*_convert_to_py
methods. - compile time increased
I'm interested if this solution worth porting from my local galactic
branch to upstream. Or may be it violates some ROS architecture principles.