Skip to content

Add an extra filter to fix the memory caching bug in Google Sheets #594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 11, 2025

Conversation

ingeniumed
Copy link
Contributor

Description

Found a bug in RDB using Google Sheets as a data source - you can't correctly select more than 1 entry from the DataViews modal. It'll always replicate the entry. It's due to the get_request_details method returning the same object for all the entries. That causes the in memory cache key to be the same and thereby return the same result. I haven't been able to replicate this in any other data sources, because they include some different detail in the url, etc. For Google Sheets we just get all the data and then filter out the entries we don't want.

Instead of trying to fix the request details method, I took advantage of the filter that we provide to enhance it instead. This way, if there are other data sources that run into this problem they could take on a similar approach. If we do see that this happens often, we can then add in a fix for the future.

Testing

  • Setup a Google Sheets Data Source
  • Ensure that when you select more than 1 entry, they are not duplicated

@ingeniumed ingeniumed requested a review from a team as a code owner June 11, 2025 02:10
@ingeniumed ingeniumed requested review from pkevan and removed request for a team June 11, 2025 02:10
@ingeniumed ingeniumed self-assigned this Jun 11, 2025
Copy link
Contributor

Test this PR in WordPress Playground.

* uri: string,
* }>
*/
public static function enhance_request_details( array $request_details, string $query, array $input_variables ): array {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken this approach on purpose. I didn't want to alter the request_details method as it's technically working as expected in this case. The actual request doesn't really have any enhancements and we really shouldn't have data source specific logic in there.

Doing it this way means we let the integration handle it. Happy to have a discussion on this.

@ingeniumed ingeniumed requested review from a team and removed request for pkevan June 11, 2025 02:15
@pkevan
Copy link
Contributor

pkevan commented Jun 11, 2025

Is this bug present in both in memory and object caching setup? I couldn't see from the bug description as to where this wasn't working.

@ingeniumed
Copy link
Contributor Author

Is this bug present in both in memory and object caching setup? I couldn't see from the bug description as to where this wasn't working.

It's present in our in-memory caching setup. We use the request details to do an in-memory cache so that it would be benefit repeated requests to the exact same data source. Since Google Sheets queries don't include the inputs in their request, and instead process the response after using those inputs the in-memory cache key ends up being the same. This filter adds the input variables to the request details thereby forcing the cache key to be different.

@ingeniumed ingeniumed merged commit b5a7f89 into trunk Jun 11, 2025
13 checks passed
@ingeniumed ingeniumed deleted the add/bug-fix-for-google-sheets branch June 11, 2025 09:45
@chriszarate
Copy link
Member

@ingeniumed I think this highlights a footgun that we should remove. Passing $input_variables to preprocess_response is a mistake, because it allows you to implement non-deterministic processing that ends up poisoning the in-memory cache.

@maxschmeling
Copy link
Contributor

@ingeniumed I think this highlights a footgun that we should remove. Passing $input_variables to preprocess_response is a mistake, because it allows you to implement non-deterministic processing that ends up poisoning the in-memory cache.

To be fair, preprocess_response can return non-deterministic results either way. Passing input into it probably increases the chances, but it does provide for more flexibility. Definite footgun, but I'm not 100% convinced yet that it should be removed.

Should we log an issue and have discussion about that elsewhere?

@chriszarate
Copy link
Member

To be fair, preprocess_response can return non-deterministic results either way.

Only with random input that you implement yourself or that is provided by the API response. Everything else is static with respect to the current request. I can log an issue.

@chriszarate
Copy link
Member

VIPCMS-1347

@ingeniumed
Copy link
Contributor Author

To me, the question would be - how should the data source that uses preprocess_response to filter out the response fulfil it's required request? That's how Google Sheets works right now. We could say in this specific case, Google Sheets API does allow you to use cell ranges to be able to pick out specific values. It's a bit tedious to use that, compared to just getting the values and filtering it out but it's there.

But, that's not always going to be the case with an API. There are lots of APIs that are poorly designed but still need to be used and in those cases it's handy to have the input_variables on hand.

I think the real question is - how could we prevent the in-memory cache from being poisoned? Could we restrict what items we use from request-details for generating the cache key? Could we restrict the changes allowed in the filter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants