-
Notifications
You must be signed in to change notification settings - Fork 16.1k
Description
Apache Airflow version
Other Airflow 2/3 version (please specify below)
If "Other Airflow 2/3 version" selected, which one?
apache/airflow:3.1.3 Docker image
What happened?
We're seeing an issue where a task is started, stopped b/c Airflow thinks it should be not be running, then attempts multiple restarts of the task. This results in the task starting execution multiple times, but it appears that Airflow loses track of (or ignores) the execution result. Note that the requests to restart occur within (milli)seconds of the task first starting. In some cases, there are several retries (11x), and the dag is marked as Failed though the offending tasks are marked as Skipped, when they clearly have been attempted multiple times.
Our deployment of Airflow has two instances of the Scheduler running, and we've seen this error occur both when the task is re/started from the same instance, and when it has been re/started from different instances of the Scheduler.
One example of the sequence of Scheduler log entries are as follows. There does not appear to be any other relevant or associated logs within the time frame, but I can provide any further logs if requested. In this case, there are no indications of any error or restart of the tasks in the Dag run logs.
[2025-12-08T08:00:01.522+0000] {{_client.py:1026}} INFO - HTTP Request: PATCH http://myinstance/execution/task-instances/019afcf9-7ee5-713b-be96-758c026e7d15/run "HTTP/1.1 200 OK"
2025-12-08 08:00:01 [debug ] Sending [supervisor] msg=StartupDetails(ti=TaskInstance(id=UUID('019afcf9-7ee5-713b-be96-758c026e7d15'), task_id='MyTask', dag_id='MyDag', run_id='scheduled__2025-12-07T08:00:00+00:00', try_number=1, ...
[2025-12-08T08:01:22.494+0000] {{_client.py:1026}} INFO - HTTP Request: PATCH http://myinstance/execution/task-instances/019afcf9-7ee5-713b-be96-758c026e7d15/run "HTTP/1.1 200 OK"
2025-12-08 08:01:22 [debug ] Sending [supervisor] msg=StartupDetails(ti=TaskInstance(id=UUID('019afcf9-7ee5-713b-be96-758c026e7d15'), task_id='MyTask', dag_id='MyDag', run_id='scheduled__2025-12-07T08:00:00+00:00', try_number=2
[2025-12-08T08:01:22.239+0000] {{_client.py:1026}} INFO - HTTP Request: PUT http://myinstance/execution/task-instances/019afcf9-7ee5-713b-be96-758c026e7d15/heartbeat "HTTP/1.1 409 Conflict"
2025-12-08 08:01:22 [error ] Server indicated the task shouldn't be running anymore [supervisor] detail={'detail': {'reason': 'not_running', 'message': 'TI is no longer in the running state and task should terminate', 'current_state': 'scheduled'}} status_code=409 ti_id=UUID('019afcf9-7ee5-713b-be96-758c026e7d15')
[2025-12-08T08:01:22.642+0000] {{_client.py:1026}} INFO - HTTP Request: PUT http://myinstance/execution/task-instances/019afcf9-7ee5-713b-be96-758c026e7d15/rtif "HTTP/1.1 201 Created"
Occasionally, the dag logs will output something like the following before restarting the task:
2025-12-08 01:18:04.771 | Server indicated the task shouldn't be running anymore. Terminating process
2025-12-08 01:18:04.771 | Task killed!
What you think should happen instead?
No response
How to reproduce
Seems to occur sporadically, and not in any consistent manner. The dags with which this occurs also varies.
Operating System
Debian GNU/Linux 12 (bookworm)
Versions of Apache Airflow Providers
apache-airflow-providers-fab == 3.0.2
apache-airflow-providers-google == 15.1.0
apache-airflow-providers-slack == 9.5.0
apache-airflow-providers-standard == 1.9.0
Deployment
Other Docker-based deployment
Deployment details
There are two instances of Airflow Scheduler deployed
Anything else?
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct