Drop retry strategy and up cache time #461

ingeniumed · 2025-04-15T05:16:47Z

Description

In trying to figure out how to add rate limiting to the plugin, I wanted to first see if our retry strategy was good enough especially in cases where a retry might be needed. This would cover the case where rate limiting is starting to occur, and we are able to ensure that we can stop it from worsening. It wouldn't cover the case where rate limiting may start happening, but hasn't yet. The Airtable SDK caught my eye, as they implement an exponential backoff strategy with jitter to try and cover the case where rate limiting is starting to occur.

I experimented with a script that fired off 50, and 100 requests to see rate limiting in action. I then added in linear backoff (what we have), followed by exponential backoff to see if they would solve it. Both options did solve it, with linear sometimes needing an extra request or two as the overall requests began to scale up unlike exponential. Exponential does add a bit more of a delay, compared to the linear backoff but its a much better guarantee in requests succeeding as requests don't go out at a consistent time.

Between this, and the fact that API calls are cached for 60s we should be in an even better place.

Testing

I tested this using a node script to isolate the retry behaviour without any caching. Essentially the script would make 50 GET requests to https://api.airtable.com/v0/base_id/table_id with an exponential backoff retry strategy. It would log when a request was retried, which was only on 429s though it can be expanded to 500s as well.

…urce when it's rate limited

ingeniumed · 2025-04-15T05:17:09Z

inc/HttpClient/HttpClient.php

@@ -124,7 +124,7 @@ public static function retry_decider( int $retries, RequestInterface $request, ?

 		$should_retry = false;

-		if ( $response && $response->getStatusCode() >= 500 ) {
+		if ( $response && ( $response->getStatusCode() >= 500 || $response->getStatusCode() === 429 ) ) {


Added status code 429 here so we retry when we are being rate limited as well.

github-actions · 2025-04-15T05:19:33Z

Test this PR in WordPress Playground.

ingeniumed · 2025-04-15T05:27:56Z

inc/HttpClient/HttpClient.php

@@ -154,6 +160,7 @@ public static function retry_delay( int $retries, ?ResponseInterface $response )
 			}
 		}

+		// Convert it to milliseconds.


I kept it this way so that both the calculated value, and the "retry-after" values are converted from s to ms.

ingeniumed · 2025-04-15T06:02:19Z

inc/HttpClient/HttpClient.php

-		$retry_after = $retries;
+		// Implement an exponential backoff strategy, with the delay capped at 10s and a minimum of 1s.
+		// Given we only retry 3 times, this will really never exceed 8s.
+		$retry_after = min( 10, 1 * pow( 2, $retries ) );


Considering we only allow 3 retry attempts, and that's not overridable it means that the max time possible is 8s. I kept it at 10s just in case.

So with this, the retry times would be

2s
4s
8s

instead of

1s
2s
3s

This is a nice improvement over existing linear delay. Was there a reason the change to use jitter was reverted ?

The Airtable docs around rate limit says the API would start working after 30 secs after hitting the rate limit, so even 8 secs of delay might not be sufficient. The Airtable SDK itself starts the delay from 10s and goes from there to 20s, 40s, etc.

It felt like overkill to use jitter considering how unpredictable the times could be. The wp_rand started giving me problems which got me to think if we needed it in the first place or not.

The Airtable SDK itself starts the delay from 10s and goes from there to 20s, 40s, etc.

The value before jitter will be 10s, 20s, 40s, and 60s. After jitter, it will be some value between 1 and the value before jitter. They have cases where it will be below 30s as well. The idea is that, because each request is spread out exponentially, on retry 1 or 2 it will succeed. I kept our values conservative to avoid blowing up the request times, as they can be overriden within AirableIntegration if we need to anyways.

Note - I did experiment with these values and fired off 50,100,150 requests and the max retries it had was 2 for 2 requests, while a small number was 1 thanks to exponential backoff.

ingeniumed · 2025-04-15T06:04:06Z

I couldn't actually trigger this code, save for in tests despite disabling caching and duplicating blocks such that 100 api calls were fired off. So I made a node script that replicated the behaviour that I was expecting to test this.

What this tells me is that likelihood, at least to start with, of our calls being rate limited at the moment is low. We do have caching in place which will be helpful, and the tweak to our retry strategy should further help with this.

chriszarate

Our retry strategy only exists in the context of a single API request, so it does not protect the API client. It is not applied across several API requests within the same render cycle, nor does it apply across several render cycles happening at about the same time across multiple workers.

Our retry strategy has not received a lot of scrutiny since it was written. I am not sure we should be retrying requests at all:

Retries will block the render cycle. An exponential back-off strategy could easily result in a request timeout for the end user.
We execute the exact same request multiple times for each block binding inside of a remote data block. For successful requests, repeated requests resolve from object or in-memory cache. For unsuccessful requests, repeated requests miss cache are executed again. This is effectively a primitive retry strategy, and would circumvent any sophisticated retry strategy we implemented in our Guzzle middleware.
Under load, a site may be executing the same request nearly simultaneously across multiple workers, all of which would effectively retry the same request.

We might be better served by disabling retries entirely and falling back to the serialized block content. Separately, we might want to implement error caching with short TTLs to prevent hammering during rate-limiting or when the API is experiencing errors.

ingeniumed · 2025-04-16T21:35:29Z

Summary of the discussion with @chriszarate on the above:

We are going to remove the retry strategy entirely. Given the way PHP works, there isn't a way that we can re-create the lazy loading model when rendering a post that would allow us to make the calls async.
We are going to cache the error responses for a few secs, and up the time on successful responses.

ingeniumed · 2025-04-17T04:00:42Z

Summary of the changes:

Drop the retry strategy entirely
Up the cache time for successful API requests to 5 mins

What I haven't done:

Added caching for errors - Guzzle throws exceptions when this happens, and so the caching layer doesn't get to caching such requests. Separately, the caching middleware we use doesn't accept multiple TTLs and we don't want successful responses to be cached as long as problematic responses. This is going to require some experiment to see how we can achieve this. I'm going to punt this to another PR.
Tracking requests and metrics - The work to send out notifications when a rate limit is being reached, or to provide analysis based on request metrics hasn't been done. There is an action already for when a query response is available, so we can hook into that. There's also another PR in review Custom query monitor panel, improve validation issue reporting #465 that will help with this work. Hence, this has also been punted to another PR.

ingeniumed added 6 commits April 15, 2025 13:14

Look for 429 for retries, and add a longer delay for airtable data so…

81e2777

…urce when it's rate limited

Switch the retry mechanism to be an exponential one

f9f31ba

Change the backoff strategy to exponential

d119f27

Remove unused code

4cbf964

Remove unused code

555d5ac

Go back to exponential with jitter

581bd6f

ingeniumed commented Apr 15, 2025

View reviewed changes

Tweak the minimum values

87483a7

ingeniumed commented Apr 15, 2025

View reviewed changes

ingeniumed marked this pull request as draft April 15, 2025 05:33

ingeniumed added 2 commits April 15, 2025 15:58

Switch to just exponential, and add a test

cd782e5

Re-enable caching

01642aa

ingeniumed commented Apr 15, 2025

View reviewed changes

ingeniumed self-assigned this Apr 15, 2025

ingeniumed marked this pull request as ready for review April 15, 2025 06:04

ingeniumed requested a review from shekharnwagh April 15, 2025 06:04

chriszarate requested changes Apr 15, 2025

View reviewed changes

ingeniumed added 2 commits April 17, 2025 09:25

Remove the retry logic

dab8c2b

Update the cache time, and add a new test for stale caches

4b4836a

ingeniumed changed the title ~~Change the backoff strategy to exponential~~ Drop retry strategy and up cache time Apr 17, 2025

Remove the new test as it's not practical

4f60428

ingeniumed requested a review from chriszarate April 17, 2025 04:00

Update doc

dbf390a

chriszarate approved these changes Apr 17, 2025

View reviewed changes

chriszarate merged commit 52d8caa into trunk Apr 18, 2025
13 checks passed

chriszarate deleted the add/rate-limiting-to-plugin branch April 18, 2025 13:27

chriszarate mentioned this pull request Apr 21, 2025

Cache error responses for a short time #474

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Drop retry strategy and up cache time #461

Drop retry strategy and up cache time #461

Uh oh!

ingeniumed commented Apr 15, 2025 •

edited

Loading

Uh oh!

ingeniumed Apr 15, 2025

Uh oh!

github-actions bot commented Apr 15, 2025

Uh oh!

ingeniumed Apr 15, 2025

Uh oh!

ingeniumed Apr 15, 2025

Uh oh!

shekharnwagh Apr 15, 2025

Uh oh!

ingeniumed Apr 15, 2025

Uh oh!

ingeniumed commented Apr 15, 2025

Uh oh!

chriszarate left a comment

Uh oh!

ingeniumed commented Apr 16, 2025

Uh oh!

ingeniumed commented Apr 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Drop retry strategy and up cache time #461

Drop retry strategy and up cache time #461

Uh oh!

Conversation

ingeniumed commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Uh oh!

ingeniumed Apr 15, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 15, 2025

Uh oh!

ingeniumed Apr 15, 2025

Choose a reason for hiding this comment

Uh oh!

ingeniumed Apr 15, 2025

Choose a reason for hiding this comment

Uh oh!

shekharnwagh Apr 15, 2025

Choose a reason for hiding this comment

Uh oh!

ingeniumed Apr 15, 2025

Choose a reason for hiding this comment

Uh oh!

ingeniumed commented Apr 15, 2025

Uh oh!

chriszarate left a comment

Choose a reason for hiding this comment

Uh oh!

ingeniumed commented Apr 16, 2025

Uh oh!

ingeniumed commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of the changes:

What I haven't done:

Uh oh!

Uh oh!

Uh oh!

ingeniumed commented Apr 15, 2025 •

edited

Loading

ingeniumed commented Apr 17, 2025 •

edited

Loading