Skip to content

turbo-tasks-backend: prevent duplicate task restores with restoring bits#92389

Draft
sokra wants to merge 6 commits intocanaryfrom
sokra/task-restoring
Draft

turbo-tasks-backend: prevent duplicate task restores with restoring bits#92389
sokra wants to merge 6 commits intocanaryfrom
sokra/task-restoring

Conversation

@sokra
Copy link
Copy Markdown
Member

@sokra sokra commented Apr 5, 2026

What?

Adds data_restoring and meta_restoring transient flag bits to TaskStorage, a shared restored: Event on Storage, and a synchronous EventListener::wait() primitive. These are used to coordinate concurrent restore attempts so the same task is never loaded from the backing store more than once simultaneously.

Why?

When multiple threads race to access an unrestored task, each thread would independently call into the backing storage to load the task's data. This causes redundant I/O and can result in one thread's write being silently discarded when another thread also applies a restore for the same task. The fix ensures only one thread performs the actual I/O while others wait for it to finish.

How?

New flag bits (storage_schema.rs):
meta_restoring and data_restoring are transient (not persisted) bits that track whether a restore is currently in progress for each category of a task.

New restored event (storage.rs):
A single Event on Storage that is notified after every restore attempt (success or failure). Waiting threads subscribe to it, then re-check the flags under lock after waking.

EventListener::wait() (event.rs):
A synchronous blocking wrapper around event_listener's .wait(), needed because restore logic runs in synchronous backend operation contexts (not async).

Restore protocol (operation/mod.rs):
All three restore entry points (task(), task_pair(), prepare_tasks_with_callback()) follow the same pattern:

  1. Classify under lock — for each category that needs restoring: if the *_restoring bit is already set, mark the task as "wait"; otherwise, set the bit and claim the I/O.
  2. Drop lock and do I/O — only the thread that claimed a category performs the backing-store lookup.
  3. Apply results under lock — merge the loaded data into the task, set *_restored, clear *_restoring, then notify() the restored event so waiters wake.
  4. Wait (for threads that saw *_restoring set) — register a listener on restored before re-checking flags (to avoid lost wakeups), then loop until *_restored is set or *_restoring clears without *_restored being set (error path).

prepare_tasks_with_callback() batches this into two explicit phases: Phase 1 claims and performs all I/O (using the batch backing-store API when there are multiple tasks), then Phase 2 waits for any tasks that were being restored by another thread.

The implementation extracts two shared helpers to avoid repetition across the three call sites:

  • apply_restore_result() — clears the restoring bit and merges storage on success, or returns the error for the caller to panic after notifying waiters.
  • wait_for_restore_or_panic() — wraps wait_for_restoring_task() and panics with a clear message on failure.

@nextjs-bot nextjs-bot added created-by: Turbopack team PRs by the Turbopack team. Turbopack Related to Turbopack with Next.js. labels Apr 5, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Apr 5, 2026

Merging this PR will improve performance by 3.22%

⚡ 1 improved benchmark
✅ 7 untouched benchmarks
⏩ 12 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation react-dom-client.development.js[full] 418.4 ms 405.3 ms +3.22%

Comparing sokra/task-restoring (622464d) with canary (d75f07b)

Open in CodSpeed

Footnotes

  1. 12 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@nextjs-bot
Copy link
Copy Markdown
Collaborator

nextjs-bot commented Apr 5, 2026

Stats from current PR

✅ No significant changes detected

📊 All Metrics
📖 Metrics Glossary

Dev Server Metrics:

  • Listen = TCP port starts accepting connections
  • First Request = HTTP server returns successful response
  • Cold = Fresh build (no cache)
  • Warm = With cached build artifacts

Build Metrics:

  • Fresh = Clean build (no .next directory)
  • Cached = With existing .next directory

Change Thresholds:

  • Time: Changes < 50ms AND < 10%, OR < 2% are insignificant
  • Size: Changes < 1KB AND < 1% are insignificant
  • All other changes are flagged to catch regressions

⚡ Dev Server

Metric Canary PR Change Trend
Cold (Listen) 456ms 456ms ▁█▅▅▅
Cold (Ready in log) 441ms 441ms ▁▂▅▇▄
Cold (First Request) 1.039s 1.111s ▆▁▁▂▂
Warm (Listen) 457ms 456ms █▁██▁
Warm (Ready in log) 442ms 439ms ▂▂▇█▁
Warm (First Request) 335ms 334ms ▃▄▇█▄
📦 Dev Server (Webpack) (Legacy)

📦 Dev Server (Webpack)

Metric Canary PR Change Trend
Cold (Listen) 455ms 456ms ▁▁▅▁▅
Cold (Ready in log) 438ms 437ms ▁▃▆▂▂
Cold (First Request) 1.939s 1.958s ▇▇▇▆▁
Warm (Listen) 456ms 455ms ▁▁▁▁▁
Warm (Ready in log) 437ms 437ms ▁▂▅▁▂
Warm (First Request) 1.950s 1.958s ▆▇█▆▁

⚡ Production Builds

Metric Canary PR Change Trend
Fresh Build 3.901s 3.886s ▇▆██▁
Cached Build 3.876s 3.861s ▄███▄
📦 Production Builds (Webpack) (Legacy)

📦 Production Builds (Webpack)

Metric Canary PR Change Trend
Fresh Build 14.487s 14.545s ▁▂▅▂▂
Cached Build 14.644s 14.622s ▁▂▇▃▄
node_modules Size 488 MB 488 MB █████
📦 Bundle Sizes

Bundle Sizes

⚡ Turbopack

Client

Main Bundles
Canary PR Change
02fkg8wfh0iju.js gzip 9.19 kB N/A -
050zwt5xh_0tx.js gzip 10.4 kB N/A -
0803-3r6mifdx.js gzip 157 B N/A -
087fzjd-gvlzv.js gzip 450 B N/A -
0cz1d0mv5g_q7.js gzip 39.4 kB 39.4 kB
0d0pcwea0l749.js gzip 151 B N/A -
0d34flh8j9r6u.js gzip 156 B N/A -
0p0khj57rq_v_.js gzip 162 B N/A -
0ppxcl_z43mad.js gzip 8.52 kB N/A -
13mnpc17btogu.js gzip 157 B N/A -
19oha6-znmkcv.js gzip 8.55 kB N/A -
1elt1qium-r2m.css gzip 115 B 115 B
1ppe_gkx_dsny.js gzip 159 B N/A -
2_5rjb7lqxntf.js gzip 221 B 221 B
219prxwxgaalc.js gzip 7.61 kB N/A -
26elcgxnn9zjd.js gzip 8.52 kB N/A -
28s7tbll9vbjr.js gzip 156 B N/A -
2900hudr6gvm0.js gzip 2.28 kB N/A -
2bbl2qamyvimj.js gzip 65.7 kB N/A -
2lv2js3kmdeho.js gzip 8.48 kB N/A -
2rehygrd36hqv.js gzip 8.58 kB N/A -
2scbv16he964r.js gzip 158 B N/A -
2srwswih0m9_h.js gzip 13.3 kB N/A -
3-jz00s4w-r6h.js gzip 13 kB N/A -
3-p9p9mheqhzx.js gzip 8.55 kB N/A -
31030bryqpolg.js gzip 8.53 kB N/A -
31dx5nmrzzuy7.js gzip 225 B N/A -
37r23u64aoktk.js gzip 155 B N/A -
3925v09gtu-5k.js gzip 49 kB N/A -
39x4zj5mjb4d_.js gzip 9.77 kB N/A -
3at2ovgizp8r6.js gzip 158 B N/A -
3bknr2e7m9s7z.js gzip 155 B N/A -
3k-48b78ys_vy.js gzip 10.1 kB N/A -
3m7-5rfj0avoz.js gzip 12.9 kB N/A -
3t39n05ky9z08.js gzip 70.8 kB N/A -
3uqce_6sa526g.js gzip 8.47 kB N/A -
3yurjqk-sjs3y.js gzip 1.46 kB N/A -
3znov3m90-kab.js gzip 168 B N/A -
40ybjx9c192n0.js gzip 13.8 kB N/A -
421vzwdt9j1b_.js gzip 5.62 kB N/A -
44jj5q-kk1jan.js gzip 157 B N/A -
turbopack-03..e7c0.js gzip 4.18 kB N/A -
turbopack-0k..9h2a.js gzip 4.18 kB N/A -
turbopack-0m..6r79.js gzip 4.18 kB N/A -
turbopack-0v..v8st.js gzip 4.18 kB N/A -
turbopack-1s..tsd2.js gzip 4.16 kB N/A -
turbopack-3-..bo-6.js gzip 4.18 kB N/A -
turbopack-31..l4gh.js gzip 4.18 kB N/A -
turbopack-36..rum2.js gzip 4.18 kB N/A -
turbopack-3i..3636.js gzip 4.18 kB N/A -
turbopack-3v..wfxq.js gzip 4.18 kB N/A -
turbopack-3v..1qwv.js gzip 4.19 kB N/A -
turbopack-3z..kcbv.js gzip 4.18 kB N/A -
turbopack-40..j2ay.js gzip 4.18 kB N/A -
turbopack-42..qz47.js gzip 4.17 kB N/A -
03dgzoo-qf3sm.js gzip N/A 9.19 kB -
03i0taczqebbx.js gzip N/A 70.8 kB -
05tx5f25dlivn.js gzip N/A 8.53 kB -
0c7ez6p2qc57f.js gzip N/A 5.62 kB -
0duvj3qk5pvgn.js gzip N/A 13.8 kB -
0ifxao1ktkgwg.js gzip N/A 156 B -
0m-34rm9w_wpm.js gzip N/A 7.6 kB -
0qnwuk92m8i7o.js gzip N/A 10.4 kB -
0r4wrn6n0ue2m.js gzip N/A 8.55 kB -
0rp0fodtbt_6m.js gzip N/A 8.52 kB -
0sfck-km4dl1k.js gzip N/A 8.47 kB -
0x0xuhmxzwkp8.js gzip N/A 8.47 kB -
1-wdvgxnzicj7.js gzip N/A 1.46 kB -
11u6nxujb2eg4.js gzip N/A 450 B -
19uunh8umr1a1.js gzip N/A 157 B -
1el9fuakpgh8m.js gzip N/A 155 B -
1jv-o1_s-zmua.js gzip N/A 49 kB -
1mifo-hcc4vf6.js gzip N/A 154 B -
1o5x2xlfw7x62.js gzip N/A 156 B -
1sk7rrnby7fjt.js gzip N/A 157 B -
2-j7jrt35v955.js gzip N/A 160 B -
27kwgyklbqvcl.js gzip N/A 152 B -
2e2z-03lx4fjc.js gzip N/A 13 kB -
2irxuxkr23i0g.js gzip N/A 160 B -
2k9ax08cjl2id.js gzip N/A 12.9 kB -
2lms6k76q5-6m.js gzip N/A 13.3 kB -
2qx4twi9i3xus.js gzip N/A 2.28 kB -
2srnqic6tvxxd.js gzip N/A 8.52 kB -
2zkc9u4375pyw.js gzip N/A 157 B -
30l7m4nayp73a.js gzip N/A 8.55 kB -
34v1uamxoz09s.js gzip N/A 170 B -
34wde90lr4zme.js gzip N/A 157 B -
3h_ecpiaatwgc.js gzip N/A 10.1 kB -
3hxw-cpxtvy_3.js gzip N/A 156 B -
3ity0aahajapd.js gzip N/A 225 B -
3wrhpuc-j1aw9.js gzip N/A 9.77 kB -
3xlti3rufjlyg.js gzip N/A 65.7 kB -
43mlw9dy_8f02.js gzip N/A 8.58 kB -
turbopack-02..6_tq.js gzip N/A 4.18 kB -
turbopack-0h..r50b.js gzip N/A 4.18 kB -
turbopack-17..z-3u.js gzip N/A 4.19 kB -
turbopack-18..evlj.js gzip N/A 4.17 kB -
turbopack-1c..a07c.js gzip N/A 4.18 kB -
turbopack-1h..a606.js gzip N/A 4.18 kB -
turbopack-1o.._bpf.js gzip N/A 4.18 kB -
turbopack-1w..e9r6.js gzip N/A 4.18 kB -
turbopack-22..wdmr.js gzip N/A 4.18 kB -
turbopack-2c..zde7.js gzip N/A 4.18 kB -
turbopack-31..4lzd.js gzip N/A 4.18 kB -
turbopack-3g..9wtz.js gzip N/A 4.18 kB -
turbopack-3l..q89n.js gzip N/A 4.18 kB -
turbopack-40..aa11.js gzip N/A 4.16 kB -
Total 464 kB 464 kB ✅ -22 B

Server

Middleware
Canary PR Change
middleware-b..fest.js gzip 722 B 714 B 🟢 8 B (-1%)
Total 722 B 714 B ✅ -8 B
Build Details
Build Manifests
Canary PR Change
_buildManifest.js gzip 434 B 435 B
Total 434 B 435 B ⚠️ +1 B

📦 Webpack

Client

Main Bundles
Canary PR Change
5528-HASH.js gzip 5.54 kB N/A -
6280-HASH.js gzip 60.7 kB N/A -
6335.HASH.js gzip 169 B N/A -
912-HASH.js gzip 4.59 kB N/A -
e8aec2e4-HASH.js gzip 62.8 kB N/A -
framework-HASH.js gzip 59.7 kB 59.7 kB
main-app-HASH.js gzip 255 B 255 B
main-HASH.js gzip 39.4 kB 39.3 kB
webpack-HASH.js gzip 1.68 kB 1.68 kB
262-HASH.js gzip N/A 4.59 kB -
2889.HASH.js gzip N/A 169 B -
5602-HASH.js gzip N/A 5.55 kB -
6948ada0-HASH.js gzip N/A 62.8 kB -
9544-HASH.js gzip N/A 61.4 kB -
Total 235 kB 235 kB ⚠️ +581 B
Polyfills
Canary PR Change
polyfills-HASH.js gzip 39.4 kB 39.4 kB
Total 39.4 kB 39.4 kB
Pages
Canary PR Change
_app-HASH.js gzip 194 B 194 B
_error-HASH.js gzip 183 B 180 B 🟢 3 B (-2%)
css-HASH.js gzip 331 B 330 B
dynamic-HASH.js gzip 1.81 kB 1.81 kB
edge-ssr-HASH.js gzip 256 B 256 B
head-HASH.js gzip 351 B 352 B
hooks-HASH.js gzip 384 B 383 B
image-HASH.js gzip 580 B 581 B
index-HASH.js gzip 260 B 260 B
link-HASH.js gzip 2.51 kB 2.51 kB
routerDirect..HASH.js gzip 320 B 319 B
script-HASH.js gzip 386 B 386 B
withRouter-HASH.js gzip 315 B 315 B
1afbb74e6ecf..834.css gzip 106 B 106 B
Total 7.98 kB 7.98 kB ✅ -1 B

Server

Edge SSR
Canary PR Change
edge-ssr.js gzip 125 kB 126 kB
page.js gzip 273 kB 273 kB
Total 398 kB 398 kB ⚠️ +163 B
Middleware
Canary PR Change
middleware-b..fest.js gzip 616 B 616 B
middleware-r..fest.js gzip 156 B 155 B
middleware.js gzip 44.2 kB 44.4 kB
edge-runtime..pack.js gzip 842 B 842 B
Total 45.8 kB 46 kB ⚠️ +209 B
Build Details
Build Manifests
Canary PR Change
_buildManifest.js gzip 715 B 718 B
Total 715 B 718 B ⚠️ +3 B
Build Cache
Canary PR Change
0.pack gzip 4.38 MB 4.37 MB 🟢 6.96 kB (0%)
index.pack gzip 115 kB 113 kB 🟢 1.79 kB (-2%)
index.pack.old gzip 115 kB 116 kB
Total 4.61 MB 4.6 MB ✅ -7.95 kB

🔄 Shared (bundler-independent)

Runtimes
Canary PR Change
app-page-exp...dev.js gzip 342 kB 342 kB
app-page-exp..prod.js gzip 189 kB 189 kB
app-page-tur...dev.js gzip 341 kB 341 kB
app-page-tur..prod.js gzip 189 kB 189 kB
app-page-tur...dev.js gzip 338 kB 338 kB
app-page-tur..prod.js gzip 187 kB 187 kB
app-page.run...dev.js gzip 338 kB 338 kB
app-page.run..prod.js gzip 187 kB 187 kB
app-route-ex...dev.js gzip 76.6 kB 76.6 kB
app-route-ex..prod.js gzip 52.2 kB 52.2 kB
app-route-tu...dev.js gzip 76.6 kB 76.6 kB
app-route-tu..prod.js gzip 52.2 kB 52.2 kB
app-route-tu...dev.js gzip 76.2 kB 76.2 kB
app-route-tu..prod.js gzip 52 kB 52 kB
app-route.ru...dev.js gzip 76.2 kB 76.2 kB
app-route.ru..prod.js gzip 52 kB 52 kB
dist_client_...dev.js gzip 324 B 324 B
dist_client_...dev.js gzip 326 B 326 B
dist_client_...dev.js gzip 318 B 318 B
dist_client_...dev.js gzip 317 B 317 B
pages-api-tu...dev.js gzip 43.8 kB 43.8 kB
pages-api-tu..prod.js gzip 33.4 kB 33.4 kB
pages-api.ru...dev.js gzip 43.8 kB 43.8 kB
pages-api.ru..prod.js gzip 33.4 kB 33.4 kB
pages-turbo....dev.js gzip 53.2 kB 53.2 kB
pages-turbo...prod.js gzip 39 kB 39 kB
pages.runtim...dev.js gzip 53.2 kB 53.2 kB
pages.runtim..prod.js gzip 39 kB 39 kB
server.runti..prod.js gzip 62.8 kB 62.8 kB
Total 3.03 MB 3.03 MB ⚠️ +2 B
📎 Tarball URL
https://vercel-packages.vercel.app/next/commits/622464d93622ac86981ca4d061144a13bf9664bc/next

@nextjs-bot
Copy link
Copy Markdown
Collaborator

nextjs-bot commented Apr 5, 2026

Failing test suites

Commit: 622464d | About building and testing Next.js

pnpm test-start test/e2e/url-imports/url-imports.test.ts (job)

  • Handle url imports > should render the /static page (DD)
  • Handle url imports > should client-render the /static page (DD)
  • Handle url imports > should render the /ssr page (DD)
  • Handle url imports > should client-render the /ssr page (DD)
  • Handle url imports > should render the /ssg page (DD)
  • Handle url imports > should client-render the /ssg page (DD)
  • Handle url imports > should render a static url image import (DD)
  • Handle url imports > should allow url import in css (DD)
  • Handle url imports > should respond on value api (DD)
Expand output

● Handle url imports › should render the /static page

next build failed with code/signal 1

   97 |             if (code || signal)
   98 |               reject(
>  99 |                 new Error(
      |                 ^
  100 |                   `next build failed with code/signal ${code || signal}`
  101 |                 )
  102 |               )

  at ChildProcess.<anonymous> (lib/next-modes/next-start.ts:99:17)

● Handle url imports › should client-render the /static page

next build failed with code/signal 1

   97 |             if (code || signal)
   98 |               reject(
>  99 |                 new Error(
      |                 ^
  100 |                   `next build failed with code/signal ${code || signal}`
  101 |                 )
  102 |               )

  at ChildProcess.<anonymous> (lib/next-modes/next-start.ts:99:17)

● Handle url imports › should render the /ssr page

next build failed with code/signal 1

   97 |             if (code || signal)
   98 |               reject(
>  99 |                 new Error(
      |                 ^
  100 |                   `next build failed with code/signal ${code || signal}`
  101 |                 )
  102 |               )

  at ChildProcess.<anonymous> (lib/next-modes/next-start.ts:99:17)

● Handle url imports › should client-render the /ssr page

next build failed with code/signal 1

   97 |             if (code || signal)
   98 |               reject(
>  99 |                 new Error(
      |                 ^
  100 |                   `next build failed with code/signal ${code || signal}`
  101 |                 )
  102 |               )

  at ChildProcess.<anonymous> (lib/next-modes/next-start.ts:99:17)

● Handle url imports › should render the /ssg page

next build failed with code/signal 1

   97 |             if (code || signal)
   98 |               reject(
>  99 |                 new Error(
      |                 ^
  100 |                   `next build failed with code/signal ${code || signal}`
  101 |                 )
  102 |               )

  at ChildProcess.<anonymous> (lib/next-modes/next-start.ts:99:17)

● Handle url imports › should client-render the /ssg page

next build failed with code/signal 1

   97 |             if (code || signal)
   98 |               reject(
>  99 |                 new Error(
      |                 ^
  100 |                   `next build failed with code/signal ${code || signal}`
  101 |                 )
  102 |               )

  at ChildProcess.<anonymous> (lib/next-modes/next-start.ts:99:17)

● Handle url imports › should render a static url image import

next build failed with code/signal 1

   97 |             if (code || signal)
   98 |               reject(
>  99 |                 new Error(
      |                 ^
  100 |                   `next build failed with code/signal ${code || signal}`
  101 |                 )
  102 |               )

  at ChildProcess.<anonymous> (lib/next-modes/next-start.ts:99:17)

● Handle url imports › should allow url import in css

next build failed with code/signal 1

   97 |             if (code || signal)
   98 |               reject(
>  99 |                 new Error(
      |                 ^
  100 |                   `next build failed with code/signal ${code || signal}`
  101 |                 )
  102 |               )

  at ChildProcess.<anonymous> (lib/next-modes/next-start.ts:99:17)

● Handle url imports › should respond on value api

next build failed with code/signal 1

   97 |             if (code || signal)
   98 |               reject(
>  99 |                 new Error(
      |                 ^
  100 |                   `next build failed with code/signal ${code || signal}`
  101 |                 )
  102 |               )

  at ChildProcess.<anonymous> (lib/next-modes/next-start.ts:99:17)

sokra and others added 5 commits April 7, 2026 05:16
…nd wait-for-restore logic

Add `meta_restoring`/`data_restoring` transient flag bits to `TaskFlags` so that
only one thread performs the backing-storage I/O for a given task category.
Threads that observe the restoring bit already set spin on a `Storage::restored`
event until the restoring thread either sets the restored bit (success) or
clears the restoring bit without setting restored (error / panic path).

The batch path in `prepare_tasks_with_callback` is restructured into two phases:
Phase 1 claims unrestored tasks and performs I/O; Phase 2 waits on tasks whose
restore was claimed by another thread.

Also adds a synchronous `EventListener::wait()` method (needed for the blocking
wait inside backend operations which run outside of an async context).

Co-Authored-By: Claude <noreply@anthropic.com>
Extract the repeated "apply restore result + clear restoring bit" pattern into an
`apply_restore_result` free function, and the repeated wait-or-panic call into a
`wait_for_restore_or_panic` method. This removes ~50 lines of duplication across
`task()`, `task_pair()`, and `prepare_tasks_with_callback()`.

Also:
- Track `any_waiting` during Phase 1a to avoid an O(n) scan at the early-exit check
- Remove `has_data_result`/`has_meta_result` temporaries in Phase 1c
- Replace `Arc::new(e)` in batch error distribution with `format!("{e:?}")`
- Remove redundant `EventListener` type annotation (Rust infers it)
- Add `From<SpecificTaskDataCategory> for TaskDataCategory` conversion (used by helper)
- Tighten `Storage::restored` to `pub(crate)` (no external users)

Co-Authored-By: Claude <noreply@anthropic.com>
…rors

In prepare_tasks_with_callback Phase 1c, a task that both self-restores one
category AND waits on another thread for the other category had its callback
called twice: once in Phase 1c and again in Phase 2. Fix this by suppressing
the Phase 1c callback when the task still has pending waits; Phase 2 will call
the callback after all categories are fully restored.

Also collapse the nested `if let Some(x) = ... { if let Some(e) = ... { } }`
patterns into `if let Some(x) = ... && let Some(e) = ...` as required by
clippy::collapsible_if (-D warnings).

Co-Authored-By: Claude <noreply@anthropic.com>
- Drop task lock before notify on success paths so woken threads don't
  immediately contend on the same DashMap shard (task, task_pair, Phase 1c)
- Only notify when we actually performed I/O (do_data/do_meta), skip
  spurious notify when we only waited for another thread
- Change wait_for_restoring_task/wait_for_restore_or_panic to take
  SpecificTaskDataCategory instead of TaskDataCategory, preventing
  silent misbehavior if All were passed
- Fix duplicated error message: bail returns a short cause string,
  the panic in wait_for_restore_or_panic wraps it with full context
- Add debug_assert in apply_restore_result to catch callers that
  invoke it for an already-restored category
- Add comment on was_unrestored capture explaining ordering dependency
- Add missing comment on meta batch Ok(None) arm

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
@sokra sokra force-pushed the sokra/task-restoring branch from b281fb6 to 63cf689 Compare April 7, 2026 05:17
if !is_restoring {
// The restoring bit was cleared without setting the restored bit.
// This means the restoring thread encountered an error.
bail!("restoring thread failed");
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bail!("restoring thread failed");
bail!("restoring failed");

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Fixed: bail!("restoring failed")

tasks_to_restore_for_data_indicies.push(i);
ready = false;

if category.includes_data() && !task.flags.is_restored(TaskDataCategory::Data) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if category.includes_data() && !task.flags.is_restored(TaskDataCategory::Data) {
if category.includes_data() && !task.flags.data_restored() {

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Fixed — now uses task.flags.data_restored() directly.

tasks_to_restore_for_meta_indicies.push(i);
ready = false;

if category.includes_meta() && !task.flags.is_restored(TaskDataCategory::Meta) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if category.includes_meta() && !task.flags.is_restored(TaskDataCategory::Meta) {
if category.includes_meta() && !task.flags.meta_restored() {

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Fixed — now uses task.flags.meta_restored() directly.

StorageWriteGuard<'e>,
),
) {
let mut data_count = 0;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a fast path here when should_check_backing_storage is false? We could just set "restored" on all tasks can call the callback in a quick loop.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Done — added a fast path at the top of prepare_tasks_with_callback that short-circuits when !should_check_backing_storage(), marking all tasks as restored and calling callbacks directly without going through the I/O pipeline.

Comment on lines +461 to +462
self.backend.storage.restored.notify(usize::MAX);
panic!("Failed to restore data for task {task_id}: {e:?}");
Copy link
Copy Markdown
Member Author

@sokra sokra Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not early exit here. We want all restore results to be applied before emitting the panic. Otherwise the restoring bit won't be unset for the remaining one and other threads wait forever. So store the task_ids and errors in a Vec and panic and notify after the loop

if let Some(result) = entry.data_restore_result.take() {
// Capture before apply_restore_result sets is_restored, so we know whether
// this restore was fresh (and we should update the task cache afterwards).
let was_unrestored = !task.flags.is_restored(TaskDataCategory::Data);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was_unrestored is always true, since we only restore when it wasn't restored before. It cannot become restored concurrently since we prevent concurrent restores now.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Removed the was_unrestored check — task_type is now always populated when we have a data restore result, since we hold the restoring bit.

Comment on lines +474 to +475
self.backend.storage.restored.notify(usize::MAX);
panic!("Failed to restore meta for task {task_id}: {e:?}");
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Same fix applied — removed the redundant is_restored check in apply_restore_result (kept the debug_assert).

drop(task);
self.task_lock_counter.release();
prepared_task_callback(self, task_id, category, task);
self.backend.storage.restored.notify(usize::MAX);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notify after the loop when all tasks have been restored.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Done — Phase 1c now only restores (drops lock), then a single notify(usize::MAX) is issued after the full loop completes.

Comment on lines +489 to +494
// Only call the callback if no other category is still being restored by another
// thread. If so, Phase 2 calls the callback after all categories are fully restored.
if !entry.wait_data && !entry.wait_meta {
let task = self.backend.storage.access_mut(task_id);
prepared_task_callback(self, task_id, category, task);
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the callback into a separate loop (new phase 2, make waiting phase 3). This will finish restoring earlier and other threads can be notified earlier and do less waiting

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Done — callbacks are now in a separate Phase 2 loop that runs after Phase 1c has finished all restoring and issued the notification. Phase 3 then handles waiting.

Comment on lines +498 to +512
for entry in &tasks {
if entry.wait_data {
self.wait_for_restore_or_panic(entry.task_id, SpecificTaskDataCategory::Data);
}
if entry.wait_meta {
self.wait_for_restore_or_panic(entry.task_id, SpecificTaskDataCategory::Meta);
}
if entry.wait_data || entry.wait_meta {
// Now that the task is restored, call the callback
self.task_lock_counter.acquire();
let task = self.backend.storage.access_mut(entry.task_id);
self.task_lock_counter.release();
prepared_task_callback(self, entry.task_id, entry.category, task);
}
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for each tasks on it's own is a bit inefficient. Let's wait for all tasks together. This will only use a single listener on the event.

Also wrap in if any_waiting

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Done — Phase 3 now waits for all tasks together with a single shared listen()/wait() loop, and the whole phase is wrapped in if any_waiting.

- Fast path in prepare_tasks_with_callback when !should_check_backing_storage()
- Use data_restored()/meta_restored() directly instead of is_restored(TaskDataCategory::*)
- Remove was_unrestored guard (always true since we hold the restoring bit)
- Remove redundant is_restored() check inside apply_restore_result (keep debug_assert)
- Fix bail! message: "restoring failed" instead of "restoring thread failed"
- Separate callbacks into Phase 2, after all restoring+notify is done in Phase 1c,
  so other threads are unblocked as early as possible
- Single notify(usize::MAX) after all tasks restored, not per-task
- Wait for all tasks together in Phase 3 with a single shared listener loop,
  wrapped in `if any_waiting`
- Add task_type and self_restored fields to TaskRestoreEntry

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

created-by: Turbopack team PRs by the Turbopack team. Turbopack Related to Turbopack with Next.js.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants