Reduce rescanning on startup #1179

cdecker · 2018-03-06T18:35:47Z

Now that we have blockchain tracking with #1117 it would be nice to reduce the need to rescan from a fixed blockheight on every startup. The main issue that I see is that we have onchaind that may be running, triggered by seeing a spend on-chain, and not getting triggered after the restart.

A simple proposal would be to have a relative rescan period that is larger than the maximum onchaind lifetime, e.g., 288 blocks. However, that period may need to be rather large, also due to HTLCs timeouts.

The more involved solution would be to remember the state of onchaind across restarts, though I'm not clear about how much we'd need to remember there, and what format we could use. Again, a simple solution would be to just store the messages we sent to onchaind in an append-only log in the DB and pruning once onchaind is happy and closes on its own.

What do you guys think makes the most sense in this situation? I have reports from users that take hours to catch up with the blockchain and I think it may also be causing a few of the awaiting funding locking issues that get reported (funding depth callback not being triggered on the remote end).

The text was updated successfully, but these errors were encountered:

cdecker · 2018-03-06T18:35:58Z

Ping @rustyrussell and @ZmnSCPxj

ZmnSCPxj · 2018-03-06T23:37:39Z

The "best" solution I can think, is indeed to save onchaind state ondisk. I think it should be possible to design a DB table for onchaind state and have well-defined changes to that table. However we would need to almost rewrite onchaind; instead of writing to in-memory structures, perform DB updates.

The alternative easier solution is indeed to log ondisk the messages that got sent intermediately. This feels like a hackish solution....

The issue is that when we save ondisk, we will be practically committing ourselves to support that form, or at least be able to upgrade from that form.

We could start with the "message log" solution though. We could add some kind of versioning (similar to db_migrations). Then later if we switch to the "proper ondisk table", we can translate the "message log" by starting with the initial state, then rewinding the messages and deleting the message log, trusting that the onchaind updates the ondisk table correctly and thus contains the appropriate state given the message log.

cdecker · 2018-03-07T03:58:17Z

So I guess one final solution would be to adjust the first_blocknum logic to just start before the first ever funding_tx spend, which would maintain the current behavior and would add minimal code changes. It's quick and easy, but it may result in us rescanning up to 2016 blocks with some settings. At 6 blocks processing per second on my machine that's still 5.6 minutes.

robtex · 2018-03-26T15:12:06Z

i would love if this issue was prioritised. it takes one of my nodes over a week to catch up with all blocks from the oldest channel. throwing more cpu and ram on the node did little improvement.
my fastest hardware running lightning has 24cores and 128G RAM and takes "only" 3-4 hours.

robtex · 2018-04-01T12:34:00Z

still can't get my node in sync because of constant crashes (#1308) .
without crashes it would now take two weeks to catch up.
please advice

Sjors · 2018-04-03T09:24:53Z

@robtex are you sure c-lightning is the bottleneck and not bitcoind?

ZmnSCPxj · 2018-04-06T03:04:07Z

Promoting to 0.6, it is affecting people paying SLEEPYARK and slowing down Blockstream world domination.

cdecker · 2018-04-06T12:08:07Z

Limited the rescan on SLEEPYARK, but that's just a stopgap solution.

robtex · 2018-04-16T15:01:36Z

@cdecker is that something i can try? otherwise i think my node has caught up in less than a week from now, it has stopped crashing. touch wood.

2018-04-07T12:39:23.543Z lightningd(27562): Adding block 460000: 000000000000000000ef751bbce8e744ad303c47ece06c8d863e4d417efc258c
2018-04-10T04:42:30.457Z lightningd(27562): Adding block 470000: 0000000000000000006c539c722e280a0769abd510af0073430159d71e6d7589
2018-04-12T09:11:13.414Z lightningd(27562): Adding block 480000: 000000000000000001024c5d7a766b173fc9dbb1be1a4dc7e039e631fd96a8b1
2018-04-14T17:04:43.322Z lightningd(27562): Adding block 490000: 000000000000000000de069137b17b8d5a3dfbd5b145b2dcfb203f15d0c4de90

robtex · 2018-04-16T15:04:14Z

@Sjors
hard to tell exactly what makes it slow. but bitcoind is using far less cpu than c-lightning.
it is a vps. i tried upgrading it to a 20vCPU 96GB RAM version, but it didn't help much at all.

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
27562 robban    20   0  187208 169432   3168 R  92.7  8.3  14001:27 lightningd
27570 robban    20   0   41632  33644   2136 R  60.8  1.6   7447:17 lightning_gossi
 1743 robban    20   0 1877232 625192  35428 S  15.6 30.5   4395:37 bitcoind

cdecker · 2018-04-16T15:20:46Z

@robtex I'm working on the patch now, but it's hard since we were replaying some of the state from on-chain to drive onchaind and closingd, so we now need to have the facilities to restore that state from the DB instead.

cdecker · 2018-04-16T15:23:13Z

If you update to the latest commit the rescan will not go below the blockheight of the first mainnet channels (504500) so that'll at least limit the rescan time considerably. I'll ping you as soon as I have the no-rescan PR ready.

robtex · 2018-04-16T15:24:12Z

great, thanks! appreciate your work and looking forward to the PR!

cdecker added enhancement wallet onchaind labels Mar 6, 2018

cdecker mentioned this issue Mar 6, 2018

Feature request: sync faster using stored state from last time. #1168

Closed

cdecker mentioned this issue Mar 27, 2018

Use pruneheight instead of hammering getblock #1283

Closed

cdecker mentioned this issue Apr 5, 2018

CHANNELD_AWAITING_LOCKIN:We've confirmed funding, they haven't yet. #1331

Closed

ZmnSCPxj added this to the v0.6 milestone Apr 6, 2018

cdecker mentioned this issue Apr 11, 2018

'lightning-rpc': Connection refused #1358

Closed

cdecker mentioned this issue Apr 20, 2018

Store onchaind state in the database and reduce rescan on startup #1398

Merged

cdecker closed this as completed in #1398 Apr 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce rescanning on startup #1179

Reduce rescanning on startup #1179

cdecker commented Mar 6, 2018

cdecker commented Mar 6, 2018

Uh oh!

ZmnSCPxj commented Mar 6, 2018

Uh oh!

cdecker commented Mar 7, 2018

Uh oh!

robtex commented Mar 26, 2018

Uh oh!

robtex commented Apr 1, 2018

Uh oh!

Sjors commented Apr 3, 2018

Uh oh!

ZmnSCPxj commented Apr 6, 2018

Uh oh!

cdecker commented Apr 6, 2018

Uh oh!

robtex commented Apr 16, 2018

Uh oh!

robtex commented Apr 16, 2018

Uh oh!

cdecker commented Apr 16, 2018

Uh oh!

cdecker commented Apr 16, 2018

Uh oh!

robtex commented Apr 16, 2018

Uh oh!

Reduce rescanning on startup #1179

Reduce rescanning on startup #1179

Comments

cdecker commented Mar 6, 2018

cdecker commented Mar 6, 2018

Uh oh!

ZmnSCPxj commented Mar 6, 2018

Uh oh!

cdecker commented Mar 7, 2018

Uh oh!

robtex commented Mar 26, 2018

Uh oh!

robtex commented Apr 1, 2018

Uh oh!

Sjors commented Apr 3, 2018

Uh oh!

ZmnSCPxj commented Apr 6, 2018

Uh oh!

cdecker commented Apr 6, 2018

Uh oh!

robtex commented Apr 16, 2018

Uh oh!

robtex commented Apr 16, 2018

Uh oh!

cdecker commented Apr 16, 2018

Uh oh!

cdecker commented Apr 16, 2018

Uh oh!

robtex commented Apr 16, 2018

Uh oh!