Description
Problem
The filtered index creation has recently been throttled due to its affect on production API performance (#2975). This has extended the time it takes to complete the create_and_populate_filtered_index
step, namely the reindex
call here:
openverse/ingestion_server/ingestion_server/indexer.py
Lines 506 to 526 in 6cdf20b
The step appears to have a default timeout of 43200 seconds (12 hours) per a recent exception:
Traceback (most recent call last):
File "/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 466, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 461, in _make_request
httplib_response = conn.getresponse()
^^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.11/site-packages/sentry_sdk/integrations/stdlib.py", line 126, in getresponse
rv = real_getresponse(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/http/client.py", line 1378, in getresponse
response.begin()
File "/usr/local/lib/python3.11/http/client.py", line 318, in begin
version, status, reason = self._read_status()
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/http/client.py", line 279, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/socket.py", line 706, in readinto
return self._sock.recv_into(b)
^^^^^^^^^^^^^^^^^^^^^^^
TimeoutError: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/venv/lib/python3.11/site-packages/requests/adapters.py", line 486, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
File "/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 798, in urlopen
retries = retries.increment(
^^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.11/site-packages/urllib3/util/retry.py", line 550, in increment
raise six.reraise(type(error), error, _stacktrace)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.11/site-packages/urllib3/packages/six.py", line 770, in reraise
raise value
File "/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 714, in urlopen
httplib_response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 468, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 357, in _raise_timeout
raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='openverse-es-8-8-2-elasticsearch-production.private', port=9200): Read timed out. (read timeout=43200)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/venv/lib/python3.11/site-packages/elasticsearch/connection/http_requests.py", line 166, in perform_request
response = self.session.send(prepared_request, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.11/site-packages/requests/adapters.py", line 532, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='openverse-es-8-8-2-elasticsearch-production.private', port=9200): Read timed out. (read timeout=43200)
Description
We should remove the wait_for_completion=True
parameter of reindex
and instead wait on the task using Elasticsearch's task management API (or using existing alternative mechanisms the ingestion server might have at its disposal to do so). This will require adding steps in the create filtered media index DAG in order to wait on the step to complete before issuing the refresh
command (which ensures replicas exist). We may also need to add a REFRESH
action to the ingestion server API which can be called by Airflow once the reindex step is complete.
Alternatives
We could alternatively override the request_timeout
parameter available to all elasticsearch-py methods to a value greater than 43200. This could be a short-term workaround.
Additional context
Metadata
Metadata
Assignees
Labels
Type
Projects
Status