Skip to content

fix: LSDV-5348: More robust and uniform SSRF defenses #4483

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 7, 2023

Conversation

jombooth
Copy link
Contributor

@jombooth jombooth commented Jul 7, 2023

PR fulfills these requirements

  • Commit message(s) and PR title follows the format [fix|feat|ci|chore|doc]: TICKET-ID: Short description of change made ex. fix: DEV-XXXX: Removed inconsistent code usage causing intermittent errors
  • Tests for the changes have been added/updated (for bug fixes/features)
  • Docs have been added/updated (for bug fixes/features)
  • Best efforts were made to ensure docs/code are concise and coherent (checked for spelling/grammatical errors, commented out code, debug logs etc.)
  • Self-reviewed and ran all changes on a local instance (for bug fixes/features)

Change has impacts in these area(s)

(check all that apply)

  • Product design
  • Backend (Database)
  • Backend (API)
  • Frontend

Describe the reason for change

While reviewing a CodeQL finding (https://github.com/heartexlabs/label-studio/security/code-scanning/586) which turned out to be a false positive, a few shortcomings in the SSRF defense code were identified:

  • The previous url_is_local implementation would throw an exception in the UI when a URL value like asdf was passed. Note that it was only called when SSRF_PROTECTION_ENABLED is True.

image

After this change, the result is

image

  • A separate check, outside of the url_is_local function, tested to see whether the URL began with "file://" in the uploader. This check would have been possible to bypass by using "FILE://" as the scheme, but luckily would have led to the above (unintentional) exception from url_is_local if SSRF protection was enabled via settings.SSRF_PROTECTION_ENABLED. Additionally, requests.get, which we have been using to fetch data since fflag_fix_back_lsdv_4568_import_csv_links_03032023_short was enabled, throws an exception for a file:// scheme by default.
  • The previous URL validation approach was inconsistent between the uploader code and the ML Backend Serializer, which had no separate check for any non-http[s] schemes. Now, we always allowlist only "http" and "https" schemes when accepting URLs from clients.
  • urllib.parse.urlparse was previously used in the implementation of url_is_local; requests by contrast uses urrlib3 to parse URLs. Per https://claroty.com/wp-content/uploads/2022/01/Exploiting-URL-Parsing-Confusion.pdf, to minimize risk of exploits stemming from parser inconsistencies, it's best to minimize the number of URL parsers used in any application.
  • Testing for SSRF defenses was previously anemic; this PR adds extensive unit tests for the validator function.

What does this fix?

See screenshots above; this definitely fixes a low-level exception reaching the UI if SSRF_PROTECTION_ENABLED, and may also help protect LS against other exploits.

What is the new behavior?

(if this is a breaking or feature change)

What is the current behavior?

(if this is a breaking or feature change)

What libraries were added/updated?

urllib3 was exact pinned to 1.26.16, which was the version pip show revealed was already installed in the LS app container.
This is because urllib3.util.parse_url is now imported in core/utils/io.py

Does this change affect performance?

(if so describe the impacts positive or negative)

Does this change affect security?

This change should make our SSRF defenses more robust, and adds testing.

What alternative approaches were there?

(briefly list any if applicable)

What feature flags were used to cover this change?

fflag_fix_back_lsdv_4568_import_csv_links_03032023_short was removed by this change. AFAIK this feature flag is enabled everywhere.

Does this PR introduce a breaking change?

(check only one)

  • Yes, and covered entirely by feature flag(s)
  • Yes, and covered partially by feature flag(s)
  • No
  • Not sure (briefly explain the situation below)

A feature flag has been removed.

What level of testing was included in the change?

(check all that apply)

  • e2e
  • integration
  • unit

Which logical domain(s) does this change affect?

URL data uploader, ML Backend API

@netlify
Copy link

netlify bot commented Jul 7, 2023

Deploy Preview for label-studio-docs-new-theme canceled.

Name Link
🔨 Latest commit 1b649e1
🔍 Latest deploy log https://app.netlify.com/sites/label-studio-docs-new-theme/deploys/64a8995c0488e40008b37181

@netlify
Copy link

netlify bot commented Jul 7, 2023

Deploy Preview for heartex-docs canceled.

Name Link
🔨 Latest commit 1b649e1
🔍 Latest deploy log https://app.netlify.com/sites/heartex-docs/deploys/64a8995c640a76000888a751

@github-actions github-actions bot added the fix label Jul 7, 2023
@jombooth jombooth marked this pull request as ready for review July 7, 2023 01:23
@jombooth jombooth changed the title fix: lsdv-5348: More robust and uniform SSRF defenses fix: LSDV-5348: More robust and uniform SSRF defenses Jul 7, 2023
@jombooth
Copy link
Contributor Author

jombooth commented Jul 7, 2023

I see there's some import-related unit test failures; will fix this tomorrow.

question for reviewers: would changing this default value to True be an option?

https://github.com/heartexlabs/label-studio/blob/1b9963f8a3b39ec4efb08d9b665fb78ccba86737/label_studio/core/settings/base.py#L398

@codecov
Copy link

codecov bot commented Jul 7, 2023

Codecov Report

Patch coverage has no change and project coverage change: +0.02 🎉

Comparison is base (1b9963f) 75.44% compared to head (1b649e1) 75.46%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4483      +/-   ##
===========================================
+ Coverage    75.44%   75.46%   +0.02%     
===========================================
  Files          156      156              
  Lines        12314    12297      -17     
===========================================
- Hits          9290     9280      -10     
+ Misses        3024     3017       -7     

see 12 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@dredivaris dredivaris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM good work!

@jombooth jombooth merged commit b81e53f into develop Jul 7, 2023
@jombooth jombooth deleted the fb-lsdv-5348/improve-ssrf-handling branch July 7, 2023 23:41
bmartel pushed a commit that referenced this pull request Aug 21, 2023
* fix: lsdv-5348: More robust and uniform SSRF defenses

* fix import path

* fix more issues with tests (incl. a flake from telemetry)

* fix broken test by mocking url validator

* regex excaping for codeql
bmartel added a commit that referenced this pull request Aug 21, 2023
* fix: lsdv-5348: More robust and uniform SSRF defenses

* fix import path

* fix more issues with tests (incl. a flake from telemetry)

* fix broken test by mocking url validator

* regex excaping for codeql

Co-authored-by: Jo Booth <[email protected]>
shayantabatabaee pushed a commit to shayantabatabaee/label-studio that referenced this pull request Sep 19, 2023
* fix: lsdv-5348: More robust and uniform SSRF defenses

* fix import path

* fix more issues with tests (incl. a flake from telemetry)

* fix broken test by mocking url validator

* regex excaping for codeql
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants