Skip to content

Add Support for HTTP Headers in URL Fetch Requests with Secure Storage for Landing Requests#20924

Merged
mvdbeek merged 35 commits intogalaxyproject:devfrom
davelopez:explore_url_fetch_with_headers
Jan 27, 2026
Merged

Add Support for HTTP Headers in URL Fetch Requests with Secure Storage for Landing Requests#20924
mvdbeek merged 35 commits intogalaxyproject:devfrom
davelopez:explore_url_fetch_with_headers

Conversation

@davelopez
Copy link
Copy Markdown
Contributor

@davelopez davelopez commented Sep 19, 2025

Currently, we can only fetch data from public URLs without any authentication or custom headers.

This PR introduces support for HTTP headers in URL fetch requests for landing requests. Headers are controlled through pattern-based configuration, and sensitive headers are automatically encrypted using Galaxy's vault system before storing in the database.

🚀 Features

1. Pattern-Based URL Header Configuration

  • URL Matching: Headers controlled per URL pattern using glob syntax (*, ?, **)
  • Multiple Patterns: One URL can match multiple patterns - union of all allowed headers is permitted
  • Configuration File: config/url_headers_conf.yml
  • Fail-Fast Security: Headers rejected if used without proper configuration

2. Automatic Sensitive Header Encryption

  • Per-Pattern Configuration: Each pattern explicitly declares which headers are sensitive
  • Vault Integration: Sensitive headers encrypted using Galaxy's vault system
  • Transparent Operation: Non-sensitive headers remain in plain text for performance
  • Secure-by-Default: Header is sensitive if ANY matching pattern marks it sensitive

3. Secure Storage Architecture

  • Database Protection: Sensitive header values are never stored in plain text
  • Vault References: Encrypted headers replaced with vault placeholders (e.g., __VAULT_HEADER_AUTHORIZATION__)
  • Automatic Decryption: Headers are automatically decrypted when landing requests are retrieved
  • Key Management: Hierarchical vault keys: headers/{landing_uuid}/{header_name}

🔧 Configuration

Example Configuration File

patterns:
    - url_pattern: "https://github.com/**"
      headers:
          - name: "Authorization"
            sensitive: true
          - name: "Accept"
            sensitive: false
    - url_pattern: "https://api.example.com/v1/**"
      headers:
          - name: "X-API-Key"
            sensitive: true

How Pattern Matching Works

  • All-Matches Logic: If a URL matches multiple patterns, the union of all allowed headers is permitted
  • Glob Syntax: Standard glob patterns (* = any chars, ? = single char, ** = recursive)
  • Order Independent: Pattern order doesn't matter - all matches contribute to allowed headers

🔧 How It Works

API Usage Examples

Creating a Data Landing Request with Headers

POST /api/data_landings
Content-Type: application/json

{
  "request_state": {
    "targets": [{
      "destination": {"type": "hdas"},
      "items": [{
        "src": "url",
        "url": "https://api.example.com/data.json",
        "ext": "json",
        "headers": {
          "Authorization": "Bearer secret-token-123",
          "X-API-Key": "api-key-456",
          "User-Agent": "Galaxy",
          "Content-Type": "application/json"
        }
      }]
    }]
  },
  "public": true
}

Creating a Workflow Landing Request with Headers

POST /api/workflow_landings
Content-Type: application/json

{
  "workflow_id": "workflow_123",
  "workflow_target_type": "stored_workflow",
  "request_state": {
    "input_dataset": {
      "src": "url",
      "url": "https://secure-data.example.com/dataset.csv",
      "ext": "csv",
      "headers": {
        "Authorization": "Bearer workflow-token-789",
        "X-Custom-Header": "custom-value"
      }
    }
  },
  "public": true
}

Under the Hood: Encryption Process

  1. Configuration Check: System validates headers against configured URL patterns
  2. Pattern Matching: All matching patterns identified for the URL
  3. Header Validation: Only headers allowed by at least one matching pattern are accepted
  4. Sensitivity Detection: The Header is sensitive if ANY matching pattern marks it sensitive
  5. Vault Storage: Sensitive headers encrypted and stored in vault:
    landing_request/headers/{landing_uuid}/authorization
    landing_request/headers/{landing_uuid}/x_api_key
    
  6. Reference Replacement: Sensitive values replaced with vault references:
    {
        "headers": {
            "Authorization": "__VAULT_HEADER_AUTHORIZATION__",
            "X-API-Key": "__VAULT_HEADER_X_API_KEY__"
        }
    }
  7. Transparent Decryption: When a landing request is retrieved, vault references are automatically replaced with actual values

🔒 Security Features

Pattern-Based Access Control

  • Explicit Allowlist: Only headers explicitly configured for matching URL patterns are allowed
  • Fail-Fast: Requests with unauthorized headers are rejected immediately
  • Secure-by-Default: If any matching pattern marks a header as sensitive, it's treated as sensitive

Vault Configuration Required

This feature requires a configured Galaxy vault. See the vault documentation for setup instructions.

Fallback Behavior

  • No Configuration: Missing config file returns null configuration (no headers allowed)
  • No Vault: Feature requires vault - fails fast if vault not configured and sensitive headers are used

✅ Testing

  • Included unit and integration tests covering:
    • Pattern matching logic
    • Header validation and sensitivity detection
    • Vault encryption/decryption process
    • API request handling with headers

🎯 Use Cases

This enhancement enables several important use cases:

  1. Private API Access: Fetch data from APIs requiring authentication tokens
  2. Rate Limiting: Include API keys for higher rate limits
  3. Custom Protocols: Support for proprietary authentication schemes
  4. Workflow Integration: Secure data fetching within workflow executions

How to test the changes?

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

Comment thread lib/galaxy/managers/landing.py Outdated
)
except Exception:
log.warning("Failed to encrypt headers in landing request state", exc_info=True)
pass # Continue without encryption if vault fails
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather this fail outright than risk storing things that should be encrypted in an unencrypted fashion - especially given the rest of the app will assume the encryption has already occurred. Does this make testing harder or something?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new version should cover this. Thank you!!

@jmchilton
Copy link
Copy Markdown
Member

This approach has made the admin configuration trivial and deployment much easier as a result. The existing upload process as is is already... sort of exploitable... I mean we don't do a great job at rate limiting Galaxy (maybe this has improved?) and we let most URIs be accessed for users on behalf of the Galaxy server - it is scary from a security perspective. Allowing users to set arbitrary headers including (especially?) user-agent makes it even a richer target for hacking I would suspect. If we shipped an allow-list of headers and URI patterns that allow that header and whether the header should be secured then I would be much more comfortable from a security perspective. It would be much harder to configure then but we would be sure exactly what the exploit surface is.

Additionally, I trust the list of headers is relatively complete and well thought through but again I would be more comfortable if we had an explicit allow list because again we would understand the exploit surface exactly.

I'm not a -1 on any of this though - I'm just expressing my concerns and telling you how I would have had it work - which people may think would be too much config. Still though one can imagine blends of the approaches - maybe it is off by default but there is a configuration that allows any requests like this or restricted set of requests and admins can decide on their level of comfort.

Even if people believe I'm being too cautious or appropriately cautious but the admin/deployment burden of addressing it would be too steep - I would still strongly encourage we don't allow the user agent to be overridden - if an API wants Galaxy to access it they shouldn't require a non-Galaxy user agent.

What we pick to allow through makes me anxious - but after that - the actual implementation of allowing those headers and securing seems really well thought through well. It seems to fit with our existing APIs beautifully - that part of the implementation seems perfect to me.

@davelopez
Copy link
Copy Markdown
Contributor Author

Thank you @jmchilton! As always, great constructive feedback! I will add some configuration to make this functionality more explicitly controlled 👍

@davelopez davelopez marked this pull request as draft September 22, 2025 07:55
@davelopez davelopez force-pushed the explore_url_fetch_with_headers branch 2 times, most recently from c7f7816 to 174dc62 Compare October 15, 2025 08:22
@davelopez davelopez marked this pull request as ready for review October 15, 2025 18:21
@davelopez
Copy link
Copy Markdown
Contributor Author

I've added the config option to specify URL patterns and sets of allowed headers for each pattern. This should be more explicit while maintaining flexibility for admins to allow general safe headers. I've also updated the PR description with the updates.
Thanks again for the feedback! Let me know if there is something else worth improving 🙏

Comment thread lib/galaxy/config/sample/url_headers_conf.yml.sample
@davelopez davelopez marked this pull request as draft October 24, 2025 14:22
@davelopez davelopez force-pushed the explore_url_fetch_with_headers branch 2 times, most recently from c0c11c7 to d7a5df4 Compare October 28, 2025 12:54
@davelopez davelopez marked this pull request as ready for review October 28, 2025 13:06
@davelopez davelopez modified the milestones: 25.1, 26.0 Oct 29, 2025
@davelopez davelopez force-pushed the explore_url_fetch_with_headers branch from d7a5df4 to 00cf447 Compare October 30, 2025 15:01
@davelopez davelopez force-pushed the explore_url_fetch_with_headers branch from 00cf447 to c970f53 Compare November 10, 2025 16:37
@davelopez davelopez force-pushed the explore_url_fetch_with_headers branch from c970f53 to 8995121 Compare December 18, 2025 17:27
@davelopez davelopez force-pushed the explore_url_fetch_with_headers branch from 8995121 to 486aeaa Compare January 8, 2026 09:22
@davelopez
Copy link
Copy Markdown
Contributor Author

Tests failures are unrelated

@davelopez davelopez force-pushed the explore_url_fetch_with_headers branch 2 times, most recently from e2e6860 to 343a5fa Compare January 14, 2026 10:40
@davelopez davelopez requested a review from a team January 14, 2026 10:42
Introduces the ability to specify optional HTTP headers for URL-based
data fetching. These headers are passed
to the fetch logic to enhance flexibility in handling authenticated
or customized requests.
Introduces functions to identify, encrypt, and decrypt sensitive HTTP
headers securely using Galaxy's Vault system.
Introduces a new module to configure and manage allowed HTTP request headers for external URL fetches.
Ensures that when multiple URL patterns match a given URL, header permissions (allowance and sensitivity) are correctly consolidated.
Introduces a new sample configuration to define an allow-list for HTTP headers in external URL fetch requests. This mechanism allows administrators to specify which headers are permitted for different URL patterns, improving security and control over fetch requests.

The configuration also supports marking headers as sensitive, prompting encryption of their values. The sample provides illustrative examples for common services like GitHub, AWS S3, and generic cloud storage.
Adds common authentication-related headers (Authorization, X-Auth-Token, X-API-Key) to the default sensitive list for HTTPS URLs in the sample configuration. This provides a more secure default example for users, preventing accidental exposure of sensitive credentials.

Includes a new comment advising users to only employ the minimum necessary configuration for their specific needs, reinforcing security best practices.
@davelopez davelopez force-pushed the explore_url_fetch_with_headers branch from 343a5fa to 962388c Compare January 21, 2026 09:30
@davelopez
Copy link
Copy Markdown
Contributor Author

I've rebased this again, but it should be ready to go

Copy link
Copy Markdown
Member

@martenson martenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great feature in a nice implementation. Thanks @davelopez

Pretty please add documentation to docs.galaxyproject.org for it. The small "security notes" in the end of a sample configuration file and PR description are a good start at what needs to be there so someone other than Nate and you can effectively deploy and run this.

@davelopez
Copy link
Copy Markdown
Contributor Author

Thanks for the reminder @martenson! I've added some admin docs in a12a082

Copy link
Copy Markdown
Member

@mvdbeek mvdbeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look great, just a minor doubt i have about storing the request state

class UrlDataElement(BaseDataElement):
src: Literal["url"]
url: str = Field(..., description="URL to upload")
headers: Optional[dict[str, str]] = Field(None, description="Optional headers to include in the URL fetch request")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you exercised all the landing request flavors (data + workflow) ? Some of these I think are persisted unmodified, so I'd look at the relevant tables

sensitive_values = ["Bearer data-test-token-should-be-encrypted", "data-test-api-key-123456"]
self._verify_headers_encrypted_in_db(str(response.uuid), sensitive_values, ToolLandingRequestModel)

def test_workflow_landing_with_encrypted_headers(self):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look great, just a minor doubt i have about storing the request state

@mvdbeek do these integration tests cover your concern? Or should I try to add another specific kind of test?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, i missed the helper, that's sufficient, thank you!

- Headers are only sent to URLs that match defined patterns
- Sensitive headers can be stored securely using Galaxy’s Vault

## Configuration Overview
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine as a global config but I'd love a follow-up where this can be set on a per-file source basis, so you can allow amazon headers for s3 etc

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and we could then allow sensible defaults)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean directly as part of a File Source config parameter or set of parameters?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure of the difference ? I was thinking that our file sources should automatically allow accepting and relaying relevant headers for that type of file source, so you don't need to allowlist for instance X-Amz-Security-Token, but we'd block this by default for http/https

Copy link
Copy Markdown
Contributor Author

@davelopez davelopez Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant something like this?

class HeaderEntry(StrictModel):
    name: str
    sensitive: bool = False


class HeaderConfig(StrictModel):
    headers: list[HeaderEntry]


class S3FSFileSourceConfiguration(FsspecBaseFileSourceConfiguration):
    anon: bool = False
    endpoint_url: Optional[str] = None
    bucket: Optional[str] = None
    secret: Optional[str] = None
    key: Optional[str] = None
    allow_headers: Optional[HeaderConfig] = None

BTW, thanks for the merge!

Copy link
Copy Markdown
Member

@mvdbeek mvdbeek Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make it

class S3FSFileSourceConfiguration(FsspecBaseFileSourceConfiguration):
    anon: bool = False
    endpoint_url: Optional[str] = None
    bucket: Optional[str] = None
    secret: Optional[str] = None
    key: Optional[str] = None
    allow_headers: Optional[HeaderConfig] = [AmzSecretHeaderEntry, OtherDefault HeaderEntryYouMightWantToAdd]

@mvdbeek mvdbeek merged commit d6e7db8 into galaxyproject:dev Jan 27, 2026
61 of 62 checks passed
@davelopez davelopez deleted the explore_url_fetch_with_headers branch January 27, 2026 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants