Skip to content

fix: XMP sidecar crash resilience, EXIF safety, and timezone-aware fallbacks#1328

Open
teh-hippo wants to merge 13 commits intoicloud-photos-downloader:masterfrom
teh-hippo:upstream-fixes
Open

fix: XMP sidecar crash resilience, EXIF safety, and timezone-aware fallbacks#1328
teh-hippo wants to merge 13 commits intoicloud-photos-downloader:masterfrom
teh-hippo:upstream-fixes

Conversation

@teh-hippo
Copy link
Copy Markdown

@teh-hippo teh-hippo commented Apr 1, 2026

Eight commits covering fixes, performance, infrastructure, and a new feature for icloudpd. Each commit is self-contained. See individual commit messages for detail.

Fixes

handle empty/missing EXIF IFDs in exif_datetime

get_photo_exif() and set_photo_exif() crash when piexif returns None for IFD blocks. Now validates IFDs before access with null-safe UTF-8 decoding.

handle empty/corrupt encoded fields in XMP sidecar generation

build_metadata() crashes on several encoded fields when iCloud returns empty or unexpected data (keywordsEnc, captionEnc, extendedDescEnc, locationEnc). All field decoders now validate before decode and catch errors gracefully.

Fixes #1277. Related to #1056, #1063.

use atomic writes for XMP sidecars and session files

XMP sidecars, session JSON, and cookie files use direct open("wb") writes. If the process is interrupted, files can be left truncated or with stale trailing bytes.

Files are now written to .tmp and moved via os.replace(). Corrupt XMP sidecars previously created by icloudpd are also automatically regenerated rather than permanently skipped.

The photo download path already uses .part + os.rename, so is unaffected.

use timezone-aware fallback datetimes

Three fromtimestamp(0) calls produced naive, timezone-dependent datetimes. All now pass tz=utc. Nine tests with hardcoded timezone strings now compute expected values dynamically.

Performance

cache directory listings to reduce network stat calls

On network-mounted storage (CIFS/NFS/USB), each os.path.exists() and os.path.getsize() call makes a stat syscall over the network. For a 19,000-item library with XMP sidecars, this doubled the per-file I/O.

DirCache scans each directory once and serves subsequent existence/size checks from memory. Integrated into the download and XMP sidecar paths.

Features

SQLite asset manifest for identity-based sync

Replaces filesystem-only dedup (filename + size matching) with a SQLite manifest that tracks assets by iCloud asset ID and zone ID. Stores metadata (GPS, keywords, dates, orientation) per asset.

Supports version-change re-downloads when iCloud updates an asset, live photo companion tracking, and per-library isolation via zone ID. Schema versioning with migration infrastructure.

Infrastructure

migrate to uv

Replace setuptools with uv for dependency management. Adds .python-version (3.13) and uv.lock for reproducible installs. Updates dependency pins to latest compatible versions.

upgrade GitHub Actions, runners, and CodeQL

  • Actions: checkout v4->v6, setup-python v5->v6, cache v4->v5, codeql-action v2->v4, and others
  • Runners: ubuntu-22.04->24.04, windows-2022->2025, macos-13->15
  • Replace codeql-analysis.yml with streamlined codeql.yml (remove unnecessary autobuild, add explicit permissions)

Testing

All 273 tests pass (238 existing + 35 new). No new runtime dependencies.

teh-hippo and others added 5 commits April 1, 2026 15:11
- get_photo_exif: check Exif IFD is None (not falsy) before accessing;
  decode bytes to str with strict UTF-8 (corrupt data caught as
  UnicodeDecodeError and returns None)
- set_photo_exif: validate both '1st' and 'Exif' IFDs exist before
  writing; return early with debug log if missing
- Broaden exception handling to catch AttributeError, TypeError,
  UnicodeDecodeError from None/.get() chains

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fixes icloud-photos-downloader#1277
Related to icloud-photos-downloader#1059

build_metadata() crashes when iCloud asset records contain empty or
corrupt encoded fields (keywordsEnc, captionEnc, extendedDescEnc,
locationEnc). Specific issues fixed:

- keywordsEnc: len(dict) checked key count (always >0) instead of
  the actual 'value' string length
- captionEnc/extendedDescEnc: no check for empty value before base64
  decode
- locationEnc: plistlib.loads() returns [] instead of dict for empty
  plist arrays, causing AttributeError on .get('alt')

All encoded field decoders now validate value before decode, catch
errors gracefully, and log via a new logger parameter on
build_metadata(). Narrows bare Exception catches to AttributeError.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Prevents file corruption when the process is interrupted (Ctrl+C,
SIGKILL, power loss, WSL2 shutdown) during a write. Files are now
written to a .tmp path and atomically moved into place via
os.replace(), which guarantees either the old or new content survives.

Affected write paths:
- XMP sidecars: write to .xmp.tmp then os.replace to .xmp
- Session JSON: write to .session.tmp then os.replace
- Cookie jar: save to .jar.tmp then os.replace

Additionally, corrupt XMP sidecars created by icloudpd are now
automatically regenerated rather than permanently skipped. The
ParseError handler checks the raw file bytes for the 'icloudpd'
marker before deciding to overwrite. Third-party XMP files with
parse errors are still preserved.

Observed corruption: a FullSizeRender.JPG.xmp file had 16 bytes of
stale data appended from a previous longer write. 22 iCloud assets
mapped to the same filename, causing 22 sequential overwrites per
sync. An interrupted write left the file with mixed content from two
different XMP generations.

Testing performed:
- 10,000 sequential wb overwrites on NTFS/WSL2: 0 corruptions
  (confirms NTFS itself is not the issue under normal operation)
- 500 SIGTERM kill trials with direct wb writes: 241 empty files
  (48% data loss from truncation without completed write)
- 500 SIGTERM kill trials with atomic os.replace: 0 empty files,
  0 corruptions (old file survives if replace hasn't run yet)
- Full pytest suite: 273 passed, 3 skipped

The photo download path (download.py) already uses a .part temp file
with os.rename, so it is not affected. The SQLite manifest uses
WAL-mode transactions and is also safe.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fix 3 fromtimestamp(0) calls that produced naive, timezone-dependent
datetimes: photos.py, autodelete.py, base.py. All now pass tz=utc.

Fix 9 tests that hardcoded UTC timezone strings -- now compute expected
date paths and timezone offsets dynamically using the same logic as
production code (fromtimestamp + astimezone(get_localzone())).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace per-file os.path.isfile() and os.stat() calls with a DirCache
that pre-scans directories via os.scandir() (single round trip per dir).
Always regenerate XMP sidecars for icloudpd-created files to detect
metadata changes from iCloud. Update cache after downloads and EXIF
injection so same-run dedup sees correct file sizes.

On a network mount with 27k files, reduces no-op sync from ~18min to ~9min.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@teh-hippo teh-hippo force-pushed the upstream-fixes branch 12 times, most recently from c29c707 to 212ac1a Compare April 6, 2026 18:27
Replace fragile size-based dedup with definitive identity matching using
iCloud's recordName (asset_id). The manifest DB at {download_dir}/.icloudpd.db
tracks every downloaded file with:
- asset_id: iCloud master recordName (stable unique ID)
- zone_id: library zone (personal vs shared, prevents collisions)
- local_path: relative path (portable)
- version_size: iCloud-reported size (pre-EXIF-injection)
- version_checksum: iCloud fileChecksum
- change_tag: recordChangeTag (detects metadata updates)

This eliminates the EXIF re-serialisation false dedup problem entirely:
the manifest stores version_size from the API, which is always compared
against the API's version.size (both pre-injection). Local file size
is never compared.

Both primary files and live photo companions get manifest rows.
Existing files are adopted into the manifest on first run.
Legacy size-based dedup is preserved as fallback when manifest is unavailable.

14 unit tests for ManifestDB, updated integration tests for manifest behaviour.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
teh-hippo and others added 2 commits April 9, 2026 15:26
- Add .python-version (3.13) for uv
- Add uv.lock for reproducible installs
- Update dependency pins to latest compatible versions:
  requests 2.32.3→2.32.5, schema 0.7.7→0.7.8, tqdm 4.67.1→4.67.3,
  typing_extensions 4.14.0→4.15.0, Flask 3.1.1→3.1.3,
  pytz 2025.2→2026.1.post1, certifi 2025.4.26→2026.2.25,
  keyring 25.6.0→25.7.0
- Keep urllib3 at 1.26.20 (intentional 1.x pin)
- Keep piexif, srp, keyrings-alt, waitress, tzlocal (already latest)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Actions:
- actions/checkout v4 -> v6
- actions/cache v4 -> v5
- actions/setup-python v5 -> v6
- actions/setup-node v4 -> v6
- actions/upload-pages-artifact v3 -> v4
- actions/upload-artifact v4 -> v5
- actions/download-artifact v4 -> v5
- docker/setup-buildx-action v3 -> v4
- docker/setup-qemu-action v3 -> v4
- peter-evans/dockerhub-description v3 -> v5

CodeQL:
- github/codeql-action v2 -> v4
- Replace codeql-analysis.yml with streamlined codeql.yml
- Remove unnecessary autobuild step and matrix for Python-only
- Add explicit permissions (security-events: write)

Runners (quality-checks):
- ubuntu-22.04 -> ubuntu-24.04
- windows-2022 -> windows-2025
- macos-13 -> macos-15

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@teh-hippo teh-hippo force-pushed the upstream-fixes branch 2 times, most recently from 13cd7d7 to 47d0757 Compare April 9, 2026 13:21
…ftool

Write iCloud metadata (rating, keywords, title, description, timezone
offset, orientation) into photo/video files using exiftool subprocess.
Patches EXIF in-place without re-encoding images (+0.1% size delta).

Features:
- Configurable categories: --write-metadata all|rating|keywords|title|dates|orientation
- Delta-based writing: reads existing metadata, only writes if values differ
- Video-aware: skips EXIF-only OffsetTime tags for MOV/MP4 containers
- Dry-run support: reports what would change without modifying files
- Errors on startup if exiftool is not installed when flag is used

Requires exiftool (apt install libimage-exiftool-perl / brew install exiftool).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@teh-hippo teh-hippo force-pushed the upstream-fixes branch 4 times, most recently from e0f035f to e8d95d1 Compare April 11, 2026 03:43
…, deprecate dates

- Manifest DB schema v2: add gps_speed, gps_timestamp, timezone_offset,
  asset_subtype, hdr_type, burst_flags, burst_flags_ext, burst_id,
  original_orientation, and raw_fields (JSON blob) columns
- Migration support for v0->v2 and v1->v2 upgrades
- Add 'location' category to --write-metadata for GPS lat/lon/alt
  with float tolerance (~1m) for idempotent delta detection
- Deprecate 'dates' category (accepted with warning, no-op)
- Guard orientation=0 (invalid EXIF, treated as None at extract time)
- Populate all new DB columns from iCloud API in _extract_manifest_metadata
- 93 tests covering all changes (22 manifest, 61 metadata, 10 CLI)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
teh-hippo and others added 2 commits April 11, 2026 13:50
Add --write-metadata-xmp and --write-metadata-exif for independent
control of XMP sidecar generation and EXIF tag writing.

--write-metadata remains as shorthand for both. All three flags
accept category lists: all, rating, keywords, title, datetime,
dates, orientation, location. Omitting value defaults to 'all'.

New categories:
- datetime: DateTimeOriginal + CreateDate
- dates: datetime + ModifyDate (superset)

Deprecations (still functional, with warnings):
- --xmp-sidecar -> --write-metadata-xmp
- --set-exif-datetime -> --write-metadata-exif datetime

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
--accept-apple-changes: re-download files when iCloud reports a
different version_size. Without this flag, version changes are
logged as warnings but files are not re-downloaded. Metadata
updates from iCloud are always applied regardless.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace PK (asset_id, zone_id, local_path) with
(asset_id, zone_id, asset_resource) where asset_resource is the Apple
iCloud resource field prefix (e.g. resOriginal, resOriginalVidCompl).

This makes local_path an updateable column, so config changes like
toggling --keep-unicode-in-filenames update the existing manifest row
instead of creating a duplicate.

- Add asset_resource column (schema v3)
- Add version_to_resource() mapping in version_size.py
- Update all manifest.lookup/upsert/remove calls in base.py
- Migration: v0/v1/v2 DBs get asset_resource via ALTER + table rebuild
- 7 new unit tests for asset_resource PK behaviour

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

InvalidFileException thrown while building XMP sidecar.

1 participant