CAP: 0076
Title: P23 State Archival bug remediation
Working Group:
Owner: Dmytro Kozhevin <@dmkozh>
Authors: Dmytro Kozhevin <@dmkozh>, Garand Tyson <@SirTyson>
Consulted: Nicolas Barry <@MonsieurNicolas>, affected protocols & Tier 1 validators (TODO: fill this in with more details)
Status: Final
Created: 2025-10-15
Discussion: https://groups.google.com/g/stellar-dev/c/LwXFzRFjW1o/m/_atQKrR9AAAJ
Protocol version: 24
Fix the entries that have been archived in corrupted state due to a bug in protocol 23 and have never had corruption observed. Also update the fee pool to reflect the unintentional XLM burns.
As specified in the Preamble.
Protocol 23 has introduced a mechanism for evicting the persistent contract data and code entries into Hot Archive (see CAP-62 for details). However, due to an implementation bug, at the moment of the archival (i.e. when the entry gets removed from the live bucket list and is moved to the Hot Archive) an arbitrary historical state has been used instead of the most recent state. That means that some entries have been archived with a state that they had at some point at time, but not the most recent state.
Consider the following example that illustrates how the bug manifests. Imagine a contract C that has a balance of 10 XLM. C then proceeds to spend 9 XLM and is left with 1 XLM balance. Then C doesn't perform any more XLM operations for a while and its balance entry gets archived. Due to a bug, the entry state where it had 10 XLM balance could get archived. If that was the case, then when someone would restore C balance, they would get back 10 XLM balance, thus effectively minting 9 XLM. Note, that until the entry has been restored its state is not observable on-chain and thus the 9 XLM mint is only meaningful from the moment of restoration.
The actual bug impact on Stellar Mainnet has been that 478 ledger entries have been archived with an incorrect state before validators disabled the state eviction at the protocol entries. Luckily, 394 entries out of these have never been restored (at least at the moment of writing this CAP; however, the restoration has been disabled at the overlay level for these entries and it's highly unlikely that any more restorations would have happened until protocol 24 upgrade).
The 394 entries that have never been restored are hashed into ledger as a part of the Hot Archive. However, as mentioned above, archived entries don't have any other observable on-chain impact (i.e. it's not possible for any transaction to read or modify their value). Because of that it is possible to amend the state of the corrupted entries to the correct value without risking to break any on-chain logic or invariants.
Amending the corrupted entries would be beneficial to the network as it avoids all sorts of issues that would occur if entries were restored in the corrupted state (such as unexpected token mints or burns, or protocols assuming invalid state). While it is a rare and unusual precedent for validators to make any changes to the on-chain state they don't own (i.e. anything beyond the network configuration itself), this particular kind of amendment has a very limited and verifiable scope. It is possible for any observer to ingest the correct (i.e. pre-archival) state for every amended entry from the history and then verify that it matches the state of the entry after protocol 24 upgrade. It is also possible to verify that only the affected entries get amended via replaying the history and examining the Hot Archive hash after the upgrade. These factors make it not possible for validators to maliciously update the entries that haven't been corrupted, or updating the corrupted entries to incorrect values without that being visible for any external observer.
Note, that this CAP does not allow amendment of any state that has actually been observed due to restoration. The only change that concerns restored corrupted entries is an amendment of the fee pool that is done just in order to reflect the total XLM balance on the network.
This CAP is aligned with the goal of maintaining the Stellar network reliability as it effectively fixes the data corruption bug for a large number of affected ledger entries and brings them to the expected state.
Two changes will be performed: amendment of the corrupted Hot Archive entries that will bring them back to the correct state they should have had during archival, and amendment of the fee pool.
The entry amendment will only happen for the entries that are still in the Hot Archive and have never been restored or updated. The full list of the affected entries is attached to the CAP.
The fee pool is amended by adding 31879035 stroops in order to account for the XLM 'burns' that happened due to restoration of the corrupted XLM balances.
The full list of the corrupted entries (including those that had been restored) is attached to this repository in corrupted_hot_archive_entries.csv file.
The file contains 478 rows with 5 columns:
ledger_key- base64 encodedLedgerKeyXDR, ledger key of the affected entrycorrect_entry- base64 encodedLedgerEntryXDR, the correct value that should have been archived (i.e. the value of the affected entry at the moment when it has been archived)archived_entry- base64 encodedLedgerEntryXDR, the actual corrupted value that got archivedevicted_ledger_seq- the ledger on which the eviction that caused corruption has occurred (for audit/validation purposes)restored_ledger_seq- the ledger on which the corrupted state has occurred ornot-restoredfor entries that have never been restored and thus are in scope of this CAP (for audit/validation purposes)
The entry amendments may strictly happen only for the entries that are still in the Hot Archive at the moment of protocol upgrade, and that still are in the exact state that has been observed at the moment of the corrupted archival.
Specifically, for every ledger_key in corrupted_hot_archive_entries a check for an entry to be amendable will be performed. The check requires that all of the following conditions are true:
- There is no entry corresponding to the
ledger_keyin the current live state ledger_keyis not being archived in the upgrade ledger- An entry
Ecorresponding to theledger_keyexists in the Hot Archive E == archived_entry(wherearchived_entryis the corresponding value from the corrupted_hot_archive_entries table)
If the entry is amendable, then a new ledger entry with value ``correct_entry- base64 encodedLedgerEntry` XDR, the correct value that should have been archived (i.e. the value of the affected entry at the moment when it has been archived)
` (from the table) will be written to the Hot Archive and thus become the most recent state of the entry that will be used for the restoration.
Note, that since the amendment only occurs in the Hot Archive, the change will not be reflected in LedgerCloseMeta, as it can not contain Hot Archive changes. The change will only be observable due to the bucketListHash change in the upgrade ledger header. After the upgrade the amended changes will be observable at the moment of restoration (and they will have the state that matches the state they had prior to archival).
31879035 stroops will be added to the feePool in the LedgerHeader. This change reflects the XLM burn that occurred due to bug. Specifically, two XLM contract balances have been restored with a balance that is lower than the balance they had before archival. The specific XLM burns are as follows (these can also be verified in corrupted_hot_archive_entries):
- Contract
CAS3J7GYLGXMF6TDJBBYYSE3HQ6BBSMLNUQ34T6TZMYMW2EVH34XOWMAhad XLM balance of291100005stroops before archival, but had290100005stroops at the moment of restoration (1000000stroops burned) - Contract
CDLMAKG5TSJA6FGP7LLC2FKJRQW6DQYMEPP6FURFVULDEQMP3PRZ4ISIhad XLM balance of1173140246stroops before archival, but had1142261211stroops balance at the moment of restoration (30879035stroops burned)
Thus 1000000 + 30879035 = 31879035 stroops have been burned, which is reflected via feePool.
This CAP does not introduce any backward incompatibilities.
The upgrade ledger will need to do a few hundred additional disk lookups, but the performance impact of that is negligible.
As mentioned in the 'Motivation' section this CAP performs modification of the ledger state that is not directly owned by the validators, which comes with the inherent risk of malicious modifications, or non-malicious erroneous modifications that lead to the further state corruption that can then be abused by the attackers.
In order to resolve these concerns to some degree, and in order to ensure that the amendments only bring back the entries to their valid state, the Stellar Core build that performs the upgrade will contain the tools that allow anyone to ensure that:
- During the replay of protocol 23 only the entries from corrupted_hot_archive_entries table are incorrectly incorrectly archived, and that their correct and archived states match those in the table
- Ensure that every entry from the table has indeed been incorrectly archived
- Ensure that during the protocol upgrade only the entries from the table have been updated, and that the update has brought them back to the correct state
The audit process described in the section above is implemented in Stellar Core. It will have an optional config called PATH_TO_PROTOCOL_23_CORRUPTION_FILE, which is the file path to the corrupted_hot_archive_entries file. When set, Stellar Core will add a series of asserts to verify the correctness of the file when replaying protocol 23 ledgers, as described in the Security Concerns section. In order to independently verify the correctness of the affected archived keys, set this config option and run catchup starting from the initial protocol 23 ledger, 58762517.
TBD