fix: XMP sidecar crash resilience, EXIF safety, and timezone-aware fallbacks#1328
Open
teh-hippo wants to merge 13 commits intoicloud-photos-downloader:masterfrom
Open
fix: XMP sidecar crash resilience, EXIF safety, and timezone-aware fallbacks#1328teh-hippo wants to merge 13 commits intoicloud-photos-downloader:masterfrom
teh-hippo wants to merge 13 commits intoicloud-photos-downloader:masterfrom
Conversation
- get_photo_exif: check Exif IFD is None (not falsy) before accessing; decode bytes to str with strict UTF-8 (corrupt data caught as UnicodeDecodeError and returns None) - set_photo_exif: validate both '1st' and 'Exif' IFDs exist before writing; return early with debug log if missing - Broaden exception handling to catch AttributeError, TypeError, UnicodeDecodeError from None/.get() chains Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fixes icloud-photos-downloader#1277 Related to icloud-photos-downloader#1059 build_metadata() crashes when iCloud asset records contain empty or corrupt encoded fields (keywordsEnc, captionEnc, extendedDescEnc, locationEnc). Specific issues fixed: - keywordsEnc: len(dict) checked key count (always >0) instead of the actual 'value' string length - captionEnc/extendedDescEnc: no check for empty value before base64 decode - locationEnc: plistlib.loads() returns [] instead of dict for empty plist arrays, causing AttributeError on .get('alt') All encoded field decoders now validate value before decode, catch errors gracefully, and log via a new logger parameter on build_metadata(). Narrows bare Exception catches to AttributeError. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Prevents file corruption when the process is interrupted (Ctrl+C, SIGKILL, power loss, WSL2 shutdown) during a write. Files are now written to a .tmp path and atomically moved into place via os.replace(), which guarantees either the old or new content survives. Affected write paths: - XMP sidecars: write to .xmp.tmp then os.replace to .xmp - Session JSON: write to .session.tmp then os.replace - Cookie jar: save to .jar.tmp then os.replace Additionally, corrupt XMP sidecars created by icloudpd are now automatically regenerated rather than permanently skipped. The ParseError handler checks the raw file bytes for the 'icloudpd' marker before deciding to overwrite. Third-party XMP files with parse errors are still preserved. Observed corruption: a FullSizeRender.JPG.xmp file had 16 bytes of stale data appended from a previous longer write. 22 iCloud assets mapped to the same filename, causing 22 sequential overwrites per sync. An interrupted write left the file with mixed content from two different XMP generations. Testing performed: - 10,000 sequential wb overwrites on NTFS/WSL2: 0 corruptions (confirms NTFS itself is not the issue under normal operation) - 500 SIGTERM kill trials with direct wb writes: 241 empty files (48% data loss from truncation without completed write) - 500 SIGTERM kill trials with atomic os.replace: 0 empty files, 0 corruptions (old file survives if replace hasn't run yet) - Full pytest suite: 273 passed, 3 skipped The photo download path (download.py) already uses a .part temp file with os.rename, so it is not affected. The SQLite manifest uses WAL-mode transactions and is also safe. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fix 3 fromtimestamp(0) calls that produced naive, timezone-dependent datetimes: photos.py, autodelete.py, base.py. All now pass tz=utc. Fix 9 tests that hardcoded UTC timezone strings -- now compute expected date paths and timezone offsets dynamically using the same logic as production code (fromtimestamp + astimezone(get_localzone())). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace per-file os.path.isfile() and os.stat() calls with a DirCache that pre-scans directories via os.scandir() (single round trip per dir). Always regenerate XMP sidecars for icloudpd-created files to detect metadata changes from iCloud. Update cache after downloads and EXIF injection so same-run dedup sees correct file sizes. On a network mount with 27k files, reduces no-op sync from ~18min to ~9min. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
c29c707 to
212ac1a
Compare
Replace fragile size-based dedup with definitive identity matching using
iCloud's recordName (asset_id). The manifest DB at {download_dir}/.icloudpd.db
tracks every downloaded file with:
- asset_id: iCloud master recordName (stable unique ID)
- zone_id: library zone (personal vs shared, prevents collisions)
- local_path: relative path (portable)
- version_size: iCloud-reported size (pre-EXIF-injection)
- version_checksum: iCloud fileChecksum
- change_tag: recordChangeTag (detects metadata updates)
This eliminates the EXIF re-serialisation false dedup problem entirely:
the manifest stores version_size from the API, which is always compared
against the API's version.size (both pre-injection). Local file size
is never compared.
Both primary files and live photo companions get manifest rows.
Existing files are adopted into the manifest on first run.
Legacy size-based dedup is preserved as fallback when manifest is unavailable.
14 unit tests for ManifestDB, updated integration tests for manifest behaviour.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
212ac1a to
bc7fea4
Compare
- Add .python-version (3.13) for uv - Add uv.lock for reproducible installs - Update dependency pins to latest compatible versions: requests 2.32.3→2.32.5, schema 0.7.7→0.7.8, tqdm 4.67.1→4.67.3, typing_extensions 4.14.0→4.15.0, Flask 3.1.1→3.1.3, pytz 2025.2→2026.1.post1, certifi 2025.4.26→2026.2.25, keyring 25.6.0→25.7.0 - Keep urllib3 at 1.26.20 (intentional 1.x pin) - Keep piexif, srp, keyrings-alt, waitress, tzlocal (already latest) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Actions: - actions/checkout v4 -> v6 - actions/cache v4 -> v5 - actions/setup-python v5 -> v6 - actions/setup-node v4 -> v6 - actions/upload-pages-artifact v3 -> v4 - actions/upload-artifact v4 -> v5 - actions/download-artifact v4 -> v5 - docker/setup-buildx-action v3 -> v4 - docker/setup-qemu-action v3 -> v4 - peter-evans/dockerhub-description v3 -> v5 CodeQL: - github/codeql-action v2 -> v4 - Replace codeql-analysis.yml with streamlined codeql.yml - Remove unnecessary autobuild step and matrix for Python-only - Add explicit permissions (security-events: write) Runners (quality-checks): - ubuntu-22.04 -> ubuntu-24.04 - windows-2022 -> windows-2025 - macos-13 -> macos-15 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
13cd7d7 to
47d0757
Compare
…ftool Write iCloud metadata (rating, keywords, title, description, timezone offset, orientation) into photo/video files using exiftool subprocess. Patches EXIF in-place without re-encoding images (+0.1% size delta). Features: - Configurable categories: --write-metadata all|rating|keywords|title|dates|orientation - Delta-based writing: reads existing metadata, only writes if values differ - Video-aware: skips EXIF-only OffsetTime tags for MOV/MP4 containers - Dry-run support: reports what would change without modifying files - Errors on startup if exiftool is not installed when flag is used Requires exiftool (apt install libimage-exiftool-perl / brew install exiftool). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
e0f035f to
e8d95d1
Compare
…, deprecate dates - Manifest DB schema v2: add gps_speed, gps_timestamp, timezone_offset, asset_subtype, hdr_type, burst_flags, burst_flags_ext, burst_id, original_orientation, and raw_fields (JSON blob) columns - Migration support for v0->v2 and v1->v2 upgrades - Add 'location' category to --write-metadata for GPS lat/lon/alt with float tolerance (~1m) for idempotent delta detection - Deprecate 'dates' category (accepted with warning, no-op) - Guard orientation=0 (invalid EXIF, treated as None at extract time) - Populate all new DB columns from iCloud API in _extract_manifest_metadata - 93 tests covering all changes (22 manifest, 61 metadata, 10 CLI) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add --write-metadata-xmp and --write-metadata-exif for independent control of XMP sidecar generation and EXIF tag writing. --write-metadata remains as shorthand for both. All three flags accept category lists: all, rating, keywords, title, datetime, dates, orientation, location. Omitting value defaults to 'all'. New categories: - datetime: DateTimeOriginal + CreateDate - dates: datetime + ModifyDate (superset) Deprecations (still functional, with warnings): - --xmp-sidecar -> --write-metadata-xmp - --set-exif-datetime -> --write-metadata-exif datetime Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
--accept-apple-changes: re-download files when iCloud reports a different version_size. Without this flag, version changes are logged as warnings but files are not re-downloaded. Metadata updates from iCloud are always applied regardless. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
e8d95d1 to
2e1b94c
Compare
Replace PK (asset_id, zone_id, local_path) with (asset_id, zone_id, asset_resource) where asset_resource is the Apple iCloud resource field prefix (e.g. resOriginal, resOriginalVidCompl). This makes local_path an updateable column, so config changes like toggling --keep-unicode-in-filenames update the existing manifest row instead of creating a duplicate. - Add asset_resource column (schema v3) - Add version_to_resource() mapping in version_size.py - Update all manifest.lookup/upsert/remove calls in base.py - Migration: v0/v1/v2 DBs get asset_resource via ALTER + table rebuild - 7 new unit tests for asset_resource PK behaviour Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2e1b94c to
aaa4565
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Eight commits covering fixes, performance, infrastructure, and a new feature for icloudpd. Each commit is self-contained. See individual commit messages for detail.
Fixes
handle empty/missing EXIF IFDs in exif_datetime
get_photo_exif()andset_photo_exif()crash when piexif returnsNonefor IFD blocks. Now validates IFDs before access with null-safe UTF-8 decoding.handle empty/corrupt encoded fields in XMP sidecar generation
build_metadata()crashes on several encoded fields when iCloud returns empty or unexpected data (keywordsEnc, captionEnc, extendedDescEnc, locationEnc). All field decoders now validate before decode and catch errors gracefully.Fixes #1277. Related to #1056, #1063.
use atomic writes for XMP sidecars and session files
XMP sidecars, session JSON, and cookie files use direct
open("wb")writes. If the process is interrupted, files can be left truncated or with stale trailing bytes.Files are now written to
.tmpand moved viaos.replace(). Corrupt XMP sidecars previously created by icloudpd are also automatically regenerated rather than permanently skipped.The photo download path already uses
.part+os.rename, so is unaffected.use timezone-aware fallback datetimes
Three
fromtimestamp(0)calls produced naive, timezone-dependent datetimes. All now passtz=utc. Nine tests with hardcoded timezone strings now compute expected values dynamically.Performance
cache directory listings to reduce network stat calls
On network-mounted storage (CIFS/NFS/USB), each
os.path.exists()andos.path.getsize()call makes a stat syscall over the network. For a 19,000-item library with XMP sidecars, this doubled the per-file I/O.DirCachescans each directory once and serves subsequent existence/size checks from memory. Integrated into the download and XMP sidecar paths.Features
SQLite asset manifest for identity-based sync
Replaces filesystem-only dedup (filename + size matching) with a SQLite manifest that tracks assets by iCloud asset ID and zone ID. Stores metadata (GPS, keywords, dates, orientation) per asset.
Supports version-change re-downloads when iCloud updates an asset, live photo companion tracking, and per-library isolation via zone ID. Schema versioning with migration infrastructure.
Infrastructure
migrate to uv
Replace setuptools with uv for dependency management. Adds
.python-version(3.13) anduv.lockfor reproducible installs. Updates dependency pins to latest compatible versions.upgrade GitHub Actions, runners, and CodeQL
Testing
All 273 tests pass (238 existing + 35 new). No new runtime dependencies.