refactor: remove tracer dependencies to support dsm sqs -> lambda #612

michael-zhao459 · 2025-06-09T18:49:30Z

What does this PR do?

Removes dependencies on internal tracer code. The get_dsm_context() is moved inside the lambda layer, and the public facing api for setting checkpoints is used instead of internal dsm code in the tracer.

Motivation

Helps decouple the lambda layer code from the tracer code, keeps with bests practice of not using internal implementation code.

Testing Guidelines

Function was properly unit tested

Additional Notes

IMPORTANT: This PR cannot get merged until the tracer releases a version that includes this PR DataDog/dd-trace-py#13646 where the manual_checkpoint parameter is added to the set_consume_checkpoint() code

Types of Changes

Bug fix
New feature
Breaking change
Misc (docs, refactoring, dependency upgrade, etc.)

Check all that apply

This PR's description is comprehensive
This PR contains breaking changes that are documented in the description
This PR introduces new APIs or parameters that are documented and unlikely to change in the foreseeable future
This PR impacts documentation, and it has been updated (or a ticket has been logged)
This PR's changes are covered by the automated tests
This PR collects user input/sensitive content into Datadog
This PR passes the integration tests (ask a Datadog member to run the tests)

michael-zhao459 · 2025-06-09T18:49:47Z

feat: add sns->sqs->lambda support #618
feat: add kinesis -> lambda support #614
feat: add sns -> lambda support #613
refactor: remove tracer dependencies to support dsm sqs -> lambda #612 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

datadog_lambda/dsm.py

piochelepiotr · 2025-06-11T20:14:49Z

datadog_lambda/dsm.py

-            )
-        except Exception as e:
-            logger.error(format_err_with_traceback(e))
+        arn = record.get("eventSourceARN", "")


we remove the try / except here. Is there a reason for that (maybe there is and I don't see it).
But we want to make sure our instrumentation never prevents the lambda from being executed, even if there is an issue with the instrumentation.

Agreed, we should move this inside the try/except.

robcarlan-datadog · 2025-06-16T19:43:29Z

tests/test_dsm.py

+            },
+        }
+
+        result = _get_dsm_context_from_lambda(message)


[nit] newline here

robcarlan-datadog · 2025-06-16T19:46:42Z

Wait for tracer version 3.9.2 before we can merge

purple4reina · 2025-06-16T21:00:59Z

datadog_lambda/dsm.py

+    context_json = None
+    message_attributes = message.get("messageAttributes")
+    if not message_attributes:
+        logger.debug("DataStreams skipped lambda message: %r", message)


It looks like we're logging debug messages multiple times for the same record.

purple4reina · 2025-06-16T21:01:31Z

datadog_lambda/dsm.py

+
+    datadog_attr = message_attributes["_datadog"]
+
+    if "stringValue" in datadog_attr:


We should do a type check here to ensure this is a dict.

Just want to clarify but we are checking that context_json is a dict right? I think context_json is the only one we need to make sure is a dict, the test you asked me to write also signaled that to me

we should also make sure datadog_attr is a dict.

purple4reina · 2025-06-16T21:04:16Z

datadog_lambda/dsm.py

+    datadog_attr = message_attributes["_datadog"]
+
+    if "stringValue" in datadog_attr:
+        # SQS -> lambda


We can use the event_type to avoid doing unnecessary work. We should already mostly know the shape of the event. Without doing so, this method is gonna get insanely large.

I would recommend creating a separate _get_dsm_context for each event type.

purple4reina · 2025-06-16T21:07:24Z

tests/test_dsm.py

                },
                {
                    "eventSourceARN": "arn:aws:sqs:us-east-1:123456789012:queue3",
                    "body": "Message 3",
+                    "messageAttributes": {
+                        "_datadog": {
+                            "stringValue": json.dumps(


We should add another test where the stringValue isn't a dict.

purple4reina · 2025-06-16T21:13:26Z

datadog_lambda/dsm.py

+        context_json = _get_dsm_context_from_lambda(record)
+        if not context_json:
+            logger.debug("DataStreams skipped lambda message: %r", record)
+            return


This line is untested.

purple4reina · 2025-06-16T21:14:07Z

datadog_lambda/dsm.py

+        carrier_get = _create_carrier_get(context_json)
+        set_consume_checkpoint(type, arn, carrier_get, manual_checkpoint=False)
+    except Exception as e:
+        logger.error(f"Unable to set dsm context: {e}")


This line is untested. We should test to make sure that if there's an exception in setting the checkpoint that this method properly captures the error.

purple4reina · 2025-06-17T17:54:51Z

datadog_lambda/dsm.py

-            payload_size = calculate_sqs_payload_size(record)
+            context_json = _get_dsm_context_from_sqs_lambda(record)
+            if not context_json:
+                return


Should you continue instead of return?

purple4reina · 2025-06-17T17:57:08Z

datadog_lambda/dsm.py

+            logger.debug("DataStreams did not handle lambda message: %r", message)
+            return None
+    else:
+        logger.debug("DataStreams did not handle lambda message: %r", message)


I would recommend making each of these log lines slightly different. That way when one is encountered, it is easy to find the exact line of code where it was produced. Otherwise, we don't know what the actual issue was.

purple4reina · 2025-06-17T18:26:47Z

datadog_lambda/dsm.py

+            return None
+    else:
+        logger.debug(
+            "DataStreams did not handle lambda message: %r, no dsm context", message


nit: put the %r at the end, the message itself could be quite long.

purple4reina · 2025-06-17T18:52:27Z

datadog_lambda/dsm.py

+
+
+def _set_dsm_context_for_record(context_json, type, arn):
+    from ddtrace.data_streams import set_consume_checkpoint


We need to be sure to set a minimum version for the ddtrace dependency. To do that, you'll want to find the first version of ddtrace that includes this set_consume_checkpoint. Then update pyproject.toml with this version.

Got it! Thanks for letting me know how this is done

move get dsm context logic into lambda layer code

d7ce695

michael-zhao459 marked this pull request as ready for review June 9, 2025 18:51

michael-zhao459 requested review from a team as code owners June 9, 2025 18:51

michael-zhao459 changed the title ~~move get dsm context logic into lambda layer code~~ feat: move get dsm context logic into lambda layer code Jun 9, 2025

This was referenced Jun 9, 2025

feat: add sns -> lambda support #613

Draft

feat: add kinesis -> lambda support #614

Draft

purple4reina reviewed Jun 9, 2025

View reviewed changes

datadog_lambda/dsm.py Outdated Show resolved Hide resolved

purple4reina reviewed Jun 9, 2025

View reviewed changes

datadog_lambda/dsm.py Outdated Show resolved Hide resolved

simplify PR

623f49e

michael-zhao459 requested a review from purple4reina June 9, 2025 22:19

michael-zhao459 added 4 commits June 10, 2025 05:34

fixes

96e3d88

fix lint

b5a711d

remove redundant try catch

7810ced

fixes

e94eb22

purple4reina reviewed Jun 10, 2025

View reviewed changes

datadog_lambda/dsm.py Outdated Show resolved Hide resolved

purple4reina reviewed Jun 10, 2025

View reviewed changes

datadog_lambda/dsm.py Outdated Show resolved Hide resolved

fixes

54bedbf

michael-zhao459 requested review from purple4reina and piochelepiotr June 10, 2025 19:27

michael-zhao459 marked this pull request as draft June 11, 2025 13:57

reworked to remove ddtrace dependencies

823a07f

michael-zhao459 changed the title ~~feat: move get dsm context logic into lambda layer code~~ refactor: remove tracer dependencies to support dsm lambda -> sqs Jun 11, 2025

michael-zhao459 added 4 commits June 11, 2025 13:56

fix

5356cc6

fix

45ed35f

fix

f729649

fix

24f6ed9

michael-zhao459 changed the title ~~refactor: remove tracer dependencies to support dsm lambda -> sqs~~ refactor: remove tracer dependencies to support dsm sqs -> lambda Jun 11, 2025

piochelepiotr reviewed Jun 11, 2025

View reviewed changes

michael-zhao459 requested a review from robcarlan-datadog June 13, 2025 18:52

robcarlan-datadog reviewed Jun 16, 2025

View reviewed changes

tests/test_dsm.py Outdated

},

}

result = _get_dsm_context_from_lambda(message)

Copy link

robcarlan-datadog Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] newline here

robcarlan-datadog approved these changes Jun 16, 2025

View reviewed changes

fix

c066b8f

michael-zhao459 marked this pull request as ready for review June 16, 2025 20:09

purple4reina reviewed Jun 16, 2025

View reviewed changes

michael-zhao459 added 4 commits June 16, 2025 18:57

fixes

235659b

formatting

34b9f45

add test for try catch on checkpoint exception

ff6e3c4

added tests for logging if ctx_json is none

1a82f68

michael-zhao459 requested a review from purple4reina June 17, 2025 15:41

purple4reina reviewed Jun 17, 2025

View reviewed changes

michael-zhao459 added 4 commits June 17, 2025 14:14

move try outside entire block

8bb2053

change to continue

5f006de

add changes to logging

e28bac4

fixed tests

2509fcc

michael-zhao459 requested a review from purple4reina June 17, 2025 18:24

purple4reina reviewed Jun 17, 2025

View reviewed changes

moved %r to the end of message

ab8c871

michael-zhao459 requested a review from purple4reina June 17, 2025 18:40

michael-zhao459 added 2 commits June 17, 2025 14:42

check datadog_attr is dict

2089fa9

fix test

24851f3

purple4reina reviewed Jun 17, 2025

View reviewed changes


		datadog_attr = message_attributes["_datadog"]

		if "stringValue" in datadog_attr:



		def _set_dsm_context_for_record(context_json, type, arn):
		from ddtrace.data_streams import set_consume_checkpoint

refactor: remove tracer dependencies to support dsm sqs -> lambda #612

Are you sure you want to change the base?

refactor: remove tracer dependencies to support dsm sqs -> lambda #612

Conversation

michael-zhao459 commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Testing Guidelines

Additional Notes

Types of Changes

Check all that apply

Uh oh!

michael-zhao459 commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robcarlan-datadog commented Jun 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

purple4reina Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

michael-zhao459 commented Jun 9, 2025 •

edited

Loading

michael-zhao459 commented Jun 9, 2025 •

edited

Loading

purple4reina Jun 16, 2025 •

edited

Loading