Skip to content

fix(native): Do not discard all tasks in maybeStartNextQueuedTask() (#27548)#27548

Open
spershin wants to merge 1 commit intoprestodb:masterfrom
spershin:export-D100103601
Open

fix(native): Do not discard all tasks in maybeStartNextQueuedTask() (#27548)#27548
spershin wants to merge 1 commit intoprestodb:masterfrom
spershin:export-D100103601

Conversation

@spershin
Copy link
Copy Markdown
Contributor

@spershin spershin commented Apr 9, 2026

Summary:

Skip any tasks that have been aborted or are no longer valid in
maybeStartNextQueuedTask(), but still start the remaining valid tasks. This
avoids a bug where aborting one task (e.g. from a completed fragment) would
silently discard other still-valid tasks from the same query.

Example of the query that got stuck due to this: 20260404_172612_64504_mgsfz

== NO RELEASE NOTE ==

Differential Revision: D100103601

@spershin spershin requested review from a team as code owners April 9, 2026 01:44
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Apr 9, 2026
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla bot commented Apr 9, 2026

CLA Missing ID CLA Not Signed

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Apr 9, 2026

Reviewer's Guide

Adjusts TaskManager's queued task handling so that aborted or invalid tasks are skipped instead of causing all sibling tasks in the same queue entry to be discarded, and adds a regression test to validate the new behavior when one queued task is aborted.

Sequence diagram for updated queued task start behavior in TaskManager

sequenceDiagram
  participant TaskManager
  participant TaskQueue
  participant QueueEntry
  participant TaskA
  participant TaskB

  TaskManager->>TaskQueue: maybeStartNextQueuedTask()
  TaskQueue-->>TaskManager: front() returns QueueEntry
  TaskManager->>TaskQueue: pop_front()

  loop over queuedTasks in QueueEntry
    TaskManager->>QueueEntry: lock(queuedTask A)
    QueueEntry-->>TaskManager: TaskA
    alt TaskA is null or TaskA.task is null
      TaskManager->>TaskManager: log Skipping null task
      Note over TaskManager: continue to next queuedTask
    else TaskA is non-null
      TaskManager->>TaskA: taskState()
      TaskA-->>TaskManager: state != kPlanned
      alt state != kPlanned
        TaskManager->>TaskManager: log Discarding aborted task
        Note over TaskManager: continue to next queuedTask
      end
    end

    TaskManager->>QueueEntry: lock(queuedTask B)
    QueueEntry-->>TaskManager: TaskB
    TaskManager->>TaskB: taskState()
    TaskB-->>TaskManager: state == kPlanned
    TaskManager->>TaskManager: add TaskB to tasksToStart
  end

  alt tasksToStart is not empty
    TaskManager->>TaskManager: proceed to start tasks in tasksToStart
    TaskManager->>TaskB: start()
  else tasksToStart is empty
    TaskManager->>TaskManager: continue checking next QueueEntry
  end
Loading

Class diagram for TaskManager queued task handling changes

classDiagram
  class TaskManager {
    +maybeStartNextQueuedTask() void
    -taskQueue std::deque~std::vector~std::weak_ptr~QueuedTask~~>
  }

  class QueuedTask {
    +task std::shared_ptr~VeloxTaskWrapper~
    +info TaskInfo
    +taskState() PrestoTaskState
  }

  class VeloxTaskWrapper {
    +task void*
  }

  class TaskInfo {
    +taskId std::string
  }

  class PrestoTaskState {
    <<enumeration>>
    kPlanned
    kRunning
    kAborted
    kFinished
    kFailed
  }

  TaskManager "1" --> "*" QueuedTask : manages
  QueuedTask "1" --> "1" VeloxTaskWrapper : has
  QueuedTask "1" --> "1" TaskInfo : has
  QueuedTask "1" --> "1" PrestoTaskState : returns

  note for TaskManager "maybeStartNextQueuedTask now skips aborted or null tasks within a queue entry and starts remaining valid tasks instead of discarding the whole entry"
Loading

File-Level Changes

Change Details Files
Change queued task dequeue logic to skip aborted/invalid tasks while still starting remaining planned tasks.
  • Iterate over queued tasks and continue past null or missing Velox tasks instead of marking the whole queue entry as invalid.
  • For tasks whose state is not planned, log and continue to the next task instead of aborting the entire batch.
  • Only break out of the queue-scanning loop when there is at least one valid task to start, based on a non-empty tasksToStart list, removing the previous queryTasksAreGoodToStart flag and related clearing logic.
presto-native-execution/presto_cpp/main/TaskManager.cpp
Add regression test ensuring that aborting one queued task from a query does not prevent sibling tasks in the same queue entry from starting and completing.
  • Enable worker overloaded task queuing and construct two plan fragments representing different stages of the same query with partitioned output.
  • Create two queued tasks sharing the same query context but different stage IDs, with no splits and 'no more splits', then verify they are queued and not started.
  • Simulate coordinator abort of one queued task, clear server overload, call maybeStartNextQueuedTask, and assert that the non-aborted sibling task starts and runs to completion.
presto-native-execution/presto_cpp/main/tests/TaskManagerTest.cpp

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In queuedTaskAbortDoesNotBlockSiblings, you modify the global SystemConfig::kWorkerOverloadedTaskQueuingEnabled flag but never restore it, which can leak state into other tests; consider saving the previous value and resetting it in a scope guard or test teardown.
  • The new maybeStartNextQueuedTask logic now silently skips queue entries where all tasks are invalid; if this is expected, it might be useful to add a single INFO/WARNING log when an entire queue entry is discarded so operators can distinguish this from normal successful dequeues.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `queuedTaskAbortDoesNotBlockSiblings`, you modify the global `SystemConfig::kWorkerOverloadedTaskQueuingEnabled` flag but never restore it, which can leak state into other tests; consider saving the previous value and resetting it in a scope guard or test teardown.
- The new `maybeStartNextQueuedTask` logic now silently skips queue entries where all tasks are invalid; if this is expected, it might be useful to add a single INFO/WARNING log when an entire queue entry is discarded so operators can distinguish this from normal successful dequeues.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@spershin spershin changed the title fix(native) Do not discard all tasks in maybeStartNextQueuedTask() fix(native): Do not discard all tasks in maybeStartNextQueuedTask() Apr 9, 2026
@meta-codesync meta-codesync bot changed the title fix(native): Do not discard all tasks in maybeStartNextQueuedTask() fix(native): Do not discard all tasks in maybeStartNextQueuedTask() (#27548) Apr 9, 2026
spershin pushed a commit to spershin/presto that referenced this pull request Apr 9, 2026
…restodb#27548)

Summary:

Skip any tasks that have been aborted or are no longer valid in
maybeStartNextQueuedTask(), but still start the remaining valid tasks. This
avoids a bug where aborting one task (e.g. from a completed fragment) would
silently discard other still-valid tasks from the same query.

Example of the query that got stuck due to this: 20260404_172612_64504_mgsfz

Differential Revision: D100103601
@spershin spershin force-pushed the export-D100103601 branch from be00a1f to 26b90b7 Compare April 9, 2026 03:04
spershin pushed a commit to spershin/presto that referenced this pull request Apr 9, 2026
…restodb#27548)

Summary:

Skip any tasks that have been aborted or are no longer valid in
maybeStartNextQueuedTask(), but still start the remaining valid tasks. This
avoids a bug where aborting one task (e.g. from a completed fragment) would
silently discard other still-valid tasks from the same query.

Example of the query that got stuck due to this: 20260404_172612_64504_mgsfz

```
== NO RELEASE NOTE ==
```

Differential Revision: D100103601
@spershin spershin force-pushed the export-D100103601 branch from 26b90b7 to 50ac510 Compare April 9, 2026 03:04
…restodb#27548)

Summary:
Pull Request resolved: prestodb#27548

Skip any tasks that have been aborted or are no longer valid in
maybeStartNextQueuedTask(), but still start the remaining valid tasks. This
avoids a bug where aborting one task (e.g. from a completed fragment) would
silently discard other still-valid tasks from the same query.

Example of the query that got stuck due to this: 20260404_172612_64504_mgsfz

```
== NO RELEASE NOTE ==
```

Differential Revision: D100103601
@spershin spershin force-pushed the export-D100103601 branch from 50ac510 to 7bbd0fb Compare April 9, 2026 03:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants