-
-
Notifications
You must be signed in to change notification settings - Fork 516
Description
The async
option is unique to the Ruby SDK. It was designed to help send events asynchronously through different backends (e.g. a Ruby thread, Sidekiq worker...etc.). Depends on the backend, it can pose thread to the system due to its additional extra memory consumption. So it's an option with some trade-offs.
But since version 4.1
, the SDK now has its own background worker managed (implemented with the famous concurrent-ruby
library). It can handle most of the async
The async
Option Approach
- The SDK serializes the event and event hint into json-compatible Ruby hashes.
- It passes the event payload and hint to the block.
- In general, the block would enqueue a background job with the above data.
- Some earlier apps use a new Ruby thread to send the data. This is unrecommended.
- With background job libraries like Sidekiq or Resque, this means adding objects into Redis.
- With delayed_job, this means adding a new delayed_job record.
- A background worker (e.g. Sidekiq worker) then picks the event and hint and send it.
Pros
Users can customize their event sending logic. But generally it's just a worker with Sentry.send_event(event, hint)
.
Cons
- The event payload (usually dozens of kbs) could be copied twice: first copied to the medium storage and then allocates the background worker process.
- When there is an event spike, it can flood the medium storage (Redis) and take down the entire system.
The Background Worker
- The SDK passes the event and its hint to the background worker (a pool of threads managed by
concurrent-ruby
). - A worker then picks the event, serializes it, and sents it.
Pros
- It doesn't allocate extra memory other than the original event payload.
- It's faster.
- It doesn't require any user code.
- The background worker doesn't queue more than 30 events. So even when there's a spike, it's unlikely to consume all the memory.
Cons
Unsent events will die with the process. Generally speaking, the queue time in background worker is very low. And the chance of missing events due to this reason is small in web apps. But for script programs, the process often leaves before the worker is able to send the event. This is whyhint: { background: false }
is required in rake integrations.However, I don't think this problem can be solved with theasync
option.
This drawback has been addressed in #1617.
Missing Events During A Spike Because of Queue Limit
I know many users have concern about the background worker's 30 events queue limit will make them lose events during a spike. But as the maintainer and a user of this SDK, I don't worry about it because:
- The spike is likely to be an urgent case, and that'll probably be fixed in a short time. So not seeing a few instances of other errors should not affect the overall coverage.
- Given these characteristics of the SDK's background worker:
- The default number of background workers are determined by the number of process cores on your machine.
- They're a lot faster than the using the
async
approach with a sidekiq/resque...etc. worker due to the reason I described in the issue. - A 30-event queue is only shared within the process/web instance, depends on the concurrency model you have. Not at a global level.
If there's a spike big enough to overflow the SDK's queue and drop some events, it'll probably overflow your background job queue with theasync
option too and/or pose a greater damage to your system.
- Sentry has a rate-limiting mechanism to prevent overflow on the platform side, which works by both rejecting new events and telling the SDK not to send new events with a 429 response. When the SDK receives a 429 response from Sentry during a spike, it'll stop sending "all events" for a given period of time.
What I'm trying to say is, it's not possible to expect Sentry to accept "all events" during a big spike regardless which approach you use. But when a spike happens, async
is more likely to become another bottleneck and/or cause other problems in your system.
My Opinion
The async
option seems redundant now and it could sometimes cause more harm. So I think we should drop it in version 5.0
.
Questions
The above analysis is only based on my personal usage of the SDK and a few cases I helped debug with. So if you're willing to share your experience, I'd like to know
Even though the decision has been made, we still would like to hear feedback about it:
- Do you use the
async
option in your apps?- If you do, what's the motivation? Will you still use it after reading the above description?
- If you don't, is it an intentional decision? If it is, what's the reason behind it?
- Do you disable the background workers with the
background_worker_threads
config option?- If you do, why?
- Or any feedback related to this topic.
Activity
josh-m-sharpe commentedon Aug 3, 2021
Long time user of sentry-ruby and sentry-raven. I didn't know this feature existed until a current migration of a large rails app to sentry. I noticed it while reading through the docs and attempted to set it up but immediately ran into issues and elected to defer in order to simplify the migration. In addition, we have a large amount of complex queues. Injecting this option into our system would likely require a bit of thought so as not to bowl over important queues while keeping error reporting timely.
I suppose this is a vote to drop it?
louim commentedon Aug 20, 2021
Hey! We currently use the
async
option. Mostly because it was recommended in the docs when we setup the app a long time ago (not sure the background worker option was even a thing at that time 👴🏼 ).I'm curious about that part:
What would happen if there is a spike in events, let's say from a noisy error filling the queue. Would another error happening at the same time be silently dropped because the queue is full? Or am I misunderstanding how it works?
I'd like to switch away from
async
because json serialization from the async version mean we have to check two version of the payload when doing custom processing inbefore_send
, ex:st0012 commentedon Aug 21, 2021
Background worker was added since v4.1.0. So it's a new thing for most users I think 🙂
As of now, the queue doesn't distinguish events. So if there's a spike of a particular error, other errors may not make it into the queue. But personally I'm not worry about this because:
async
approach with a sidekiq/resque...etc. worker due to the reason I described in the issue.If there's a spike big enough to overflow the SDK's queue and drop some events, it'll probably overflow your background job queue with the
async
option too and/or pose a greater damage to your system.kzaitsev commentedon Sep 27, 2021
At my job, we are working fine with the
async
because we already have the sidekiq in our stack. I can't understand whyasync
can't be an optional way to deliver events withoutasync
deprecation? As a solution, you can highlight possible issues withasync
in the documentation.Maybe I can't understand the problem because using sentry only for exceptions without APM.
st0012 commentedon Sep 27, 2021
@Bugagazavr
There are 2 main cost of having this option around:
sentry-ruby
is the only one that supports such option. This means we always need to consider this extra condition when making changes to the event sending logic. And it'll make future SDK alignment harder. (It surely madesentry-raven
->sentry-ruby
conversion harder)sentry-ruby/sentry-ruby/lib/sentry/client.rb
Lines 119 to 134 in 42455c8
sentry-ruby/sentry-rails/app/jobs/sentry/send_event_job.rb
Lines 1 to 33 in 42455c8
If the upside is high, these cost wouldn't be an issue. That's why we have had it for many years. But since we already have a better solution for the problem (background worker) that has much less downside, I don't think it's worth it now.
Drop sentry async - use Sentry's builtin background worker
Drop sentry async job submission - use Sentry's builtin background wo…
github-actions commentedon Oct 28, 2021
This issue has gone three weeks without activity. In another week, I will close it.
But! If you comment or otherwise update it, I will reset the clock, and if you label it
Status: Backlog
orStatus: In Progress
, I will leave it alone ... forever!"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀
29 remaining items
st0012 commentedon May 28, 2022
@ariccio You can simply delete it 😉
vadviktor commentedon Aug 2, 2022
We at my company have been plagued by the async feature for 1-2 years now; hitting the limit of the payload and the rate limit made us scratch our heads on how to safeguard against those (before Sentry internally remedied them). How we've caught these issues? By seeing Sidekiq being brought to its knees and important messages not processed.
Right now, we have decided to end the
async
reign, and I was glad to see this feature being deprecated since our last version update. 🙌Remove deprecated Sentry async
async
configuration #1894Async option removal was delayed til 6.0 (#7422)
sl0thentr0py commentedon Dec 10, 2024
async has been deprecated for long enough and is removed in the next 6.0 major branch so we can close this placeholder issue now.