Closed
Description
Start Date | ETA | Project Lead | Actual Ship Date |
---|---|---|---|
2023-04-01 | 2024-04-31 | @AetherUnbound | TBD |
Description
Modify the API staging environment to include a proportional subset of media from each provider. Increase the frequency of data refreshes.
This project does not address metrics for measuring and tracking relevancy.
To implement this project, we want to read the production /stats/
endpoints to get the media totals for each provider, then scale these numbers down to set the ingestion limit per provider in staging.
This project could also include the update of the Elasticsearch cluster (or setting up a new cluster and moving the staging there).
Documents
- Project Proposal: Search Relevancy Sandbox #1107 - doc link
- Implementation Plan: Update staging database #1154 - doc link
- Implementation Plan: Rapid iteration on ingestion server index configuration #2133 - doc link
- Implementation Plan: Staging Elasticsearch Reindex DAGs #2358
Issues
- Implementation Plan: Rapid iteration on ingestion server index configuration #1985
- Implementation Plan: Staging Elasticsearch reindex DAGs for both potential index types #1987
- Staging database recreation DAG #1989
- Add notice to staging Django Admin UI of next scheduled data wipe #1990
- Add Elasticsearch Airflow Provider #2370
- Add Airflow Connections for Elasticsearch clusters #2371
- Add DAG for creating staging indices #3232
- Build ES index creation DAG #2372
- Create a DAG for creating es indices proportional by provider #3488
- Set up staging ingestion server connection in Airflow #3492
- Make a DAG for pointing ES aliases #3493
Milestones
- Rapid iteration on Elasticsearch indices: https://github.com/WordPress/openverse/milestone/16
Prior Art
Metadata
Metadata
Assignees
Type
Projects
Status
✅ Success