Skip to content

Implementation Plan: Staging Elasticsearch reindex DAGs for both potential index types #1987

Closed
@AetherUnbound

Description

@AetherUnbound

Description

This issue is for tracking the drafting of an implementation plan in correspondence with the one noted in the search relevancy sandbox project proposal: Staging Elasticsearch reindex DAGs for both potential index types (these will be subsets of the full data refresh).

From the project plan:

This plan will describe the DAG or DAGs which will be used to create/update both the proportional-by-provider and production-data-volume indices.

It will also describe the mechanism by which maintainers can rapidly switch index the staging API uses. This could be done in two separate ways: a DAG which allows changing the primary index alias or a set of changes to the API which would allow queries to specify which index they use. The implementation plan should explore and describe both options.

There was some discussion on this in #1107 (link), from @zackkrida and @sarayourfriend:

(Zack)
Feels quite important to me and like it may be a bit under-explained by this proposal. As written, the proposal seems to suggest that the DAGs would be used to allow for creating a new staging index that replaces the default, "live" index the staging API is pointed at. Is that correct?

Alternatively, I can see a lot of value in having DAGs that allow for creating and updating staging indices with different qualities, but having a mechanism that allows developers to switch between them much more rapidly, perhaps even a query param-based solution (which would allow, for example, to switch indices via a checkbox on the staging.openverse.org/preferences page).

To put this all another way, I'm wondering if requirement #5 should be a bit more specific in terms of what "easily" means.

...

(Madison)
That's a great point, but I actually think this would be a good thing to determine in the implementation plan! It was my intention to have a single alias (like production, at least at the current moment) and allow maintainers to swap between these aliases. However, I do quite like the idea of an API setting which might allow dynamically changing which index is used at query time! That would bring increased complexity to the plan and the project that I wasn't initially considering, but if we scope it out as part of the plan, potentially starting with the rapid swapping but laying out a plan for what changing the index at query time might look like, we could even add that capability down the line outside of the bounds of this specific effort!

...

(Sara)
FWIW, if anyone is looking at this later during implementation planning, this probably doesn't need to be more complicated than a query parameter that determines the index, but we'd need to be careful to juggle things in light of https://docs.openverse.org/projects/proposals/detecting_sensitive_textual_content/20230308-implementation_plan_filtering_and_designating_results_with_sensitive_textual_content.html

Metadata

Metadata

Assignees

Type

No type

Projects

Status

✅ Done

Status

Accepted

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions