Skip to content

Switch Google Cloud Storage file source from fs-gcsfs to gcsfs (fsspec)#21590

Merged
mvdbeek merged 5 commits intogalaxyproject:devfrom
mvdbeek:claude/fix-issue-21565-CI82M
Jan 19, 2026
Merged

Switch Google Cloud Storage file source from fs-gcsfs to gcsfs (fsspec)#21590
mvdbeek merged 5 commits intogalaxyproject:devfrom
mvdbeek:claude/fix-issue-21565-CI82M

Conversation

@mvdbeek
Copy link
Copy Markdown
Member

@mvdbeek mvdbeek commented Jan 15, 2026

Replace the pyfilesystem2-based fs-gcsfs with the fsspec-based gcsfs library. The gcsfs library has native support for anonymous access via token='anon', which properly handles credential refresh without errors.

This fixes the "Anonymous credentials cannot be refreshed" error that was occurring with fs-gcsfs when listing files from public GCS buckets.

Fixes #21565

I don't have a GCP account handy, @dannon or @ksuderman would you be able to test this ?

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

Comment thread packages/files/setup.cfg Outdated
Comment thread packages/files/setup.cfg
and add gcsfs to the test dependency group in pyproject.toml, so that
the unit tests run also with the regular unit tests.

Replace the pyfilesystem2-based fs-gcsfs with the fsspec-based gcsfs library.
The gcsfs library has native support for anonymous access via token='anon',
which properly handles credential refresh without errors.

This fixes the "Anonymous credentials cannot be refreshed" error that was
occurring with fs-gcsfs when listing files from public GCS buckets.

Fixes galaxyproject#21565
@mvdbeek mvdbeek force-pushed the claude/fix-issue-21565-CI82M branch from d5d0959 to d4de755 Compare January 15, 2026 12:22
@mvdbeek
Copy link
Copy Markdown
Member Author

mvdbeek commented Jan 15, 2026

Applied both suggestions, thank you @nsoranzo!

@mvdbeek mvdbeek marked this pull request as ready for review January 15, 2026 13:25
@github-actions github-actions Bot added this to the 26.0 milestone Jan 15, 2026
@mvdbeek mvdbeek requested review from dannon and ksuderman January 15, 2026 13:25
@dannon
Copy link
Copy Markdown
Member

dannon commented Jan 15, 2026

I can try to test later -- looks like this dropped all the virtual directory workarounds, did you see in the docs for fsspec gcsfs that that should work better?

@mvdbeek
Copy link
Copy Markdown
Member Author

mvdbeek commented Jan 15, 2026

No, i haven't checked this. Claude said this:

Problem: The fs-gcsfs library was using AnonymousCredentials from google.auth, which raises InvalidOperation when refresh() is called. Recent changes in the Google Cloud libraries caused credential refresh to be attempted even for anonymous access.

Solution: Switched to gcsfs (fsspec-based) which has native anonymous access support via token='anon'. This properly handles the anonymous credential case without attempting refresh.

Files Changed
lib/galaxy/files/sources/googlecloudstorage.py - Complete rewrite:

Now extends FsspecFilesSource instead of PyFilesystem2FilesSource
Uses GCSFileSystem with token='anon' for anonymous access
Much simpler implementation (146 lines vs 225 lines)
Removed all the workarounds for fs_gcsfs directory listing issues
lib/galaxy/dependencies/conditional-requirements.txt - Changed dependency from fs-gcsfs + google-cloud-storage to just gcsfs

lib/galaxy/dependencies/init.py - Renamed check_fs_gcsfs to check_gcsfs, removed check_google_cloud_storage

packages/files/setup.cfg - Updated test dependency from fs-gcsfs to gcsfs

test/unit/files/test_gcsfs.py - Updated import from fs_gcsfs.GCSFS to gcsfs.GCSFileSystem

test/unit/files/gcsfs_file_sources_conf.yml - Simplified config (removed unused oauth parameters)

The gcsfs library is a well-maintained fsspec implementation that's consistent with the existing s3fs file source pattern already in use in Galaxy.

@nsoranzo
Copy link
Copy Markdown
Member

The gcsfs library is a well-maintained fsspec implementation that's consistent with the existing s3fs file source pattern already in use in Galaxy.

and fs-gcsfs is unmaintained, last release in 2022.

@dannon
Copy link
Copy Markdown
Member

dannon commented Jan 15, 2026

Yeah, sounds fine to swap to something more maintained, we just need to make sure this works with buckets that use virtual directories. I'll give it a shot later today with the motrpac buckets.

@ksuderman
Copy link
Copy Markdown
Contributor

I can test this later tonight.

Some fsspec implementations (notably gcsfs) include the directory
being listed as one of the entries in ls() results. This caused
duplicate entries to appear when navigating into folders.
Reverts field renames that would break existing configurations:
- service_account_credentials → service_account_json
- access_token → token
Allows scoping the file source to a subdirectory within the bucket,
matching the behavior of the previous fs-gcsfs implementation.
@dannon
Copy link
Copy Markdown
Member

dannon commented Jan 16, 2026

Good news is that this doesn't have the same virtual directory listing issue, so that's good to go.

The config name changes were breaking changes -- we need to either document that, allow for an 'old_value' mapping, or just keep using the old config names. I pushed a commit that does the latter because I'm guessing claude did the renames without thinking about existing configs.

The other thing is that this dropped root path support, which I added back.

Feel free to drop any of this if the changes were intentional, we just need to make sure to provide migration notes if so.

@mvdbeek
Copy link
Copy Markdown
Member Author

mvdbeek commented Jan 16, 2026

Awesome, thanks for checking. Looks good to me, feel free to merge.

Copy link
Copy Markdown
Member

@dannon dannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ksuderman Want to sanity check my changes and see that this works in your implementation, too? Then it's good to go. I'll go ahead and approve from my end.

@ksuderman
Copy link
Copy Markdown
Contributor

Ok, I can't actually test this due to #21262. In fact it seems to be much worse since Jan 12th. The web handler pod has been running for 90 minutes and it is still loading tools... It looks like #21554 was the culprit, or at least that was the first smoke-test to fail, but I have no idea why that would break anything.

@mvdbeek mvdbeek merged commit 9bc373c into galaxyproject:dev Jan 19, 2026
53 of 60 checks passed
@github-actions
Copy link
Copy Markdown

This PR was merged without a "kind/" label, please correct.

@mvdbeek
Copy link
Copy Markdown
Member Author

mvdbeek commented Jan 19, 2026

It looks like #21554 was the culprit,

I don't think that's the issue. xref #21262 (comment) ?

@mvdbeek mvdbeek deleted the claude/fix-issue-21565-CI82M branch January 19, 2026 09:40
@ksuderman
Copy link
Copy Markdown
Contributor

I don't think that's the issue. xref #21262 (comment) ?

#21554 was just the PR that caused the smoke test to start failing. I haven't had an opportunity to investigate the root cause yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unit test tests/files/test_gcsfs.py::test_file_source started to fail

4 participants