Skip to content

🐛(back) validate document content in serializer #822

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 29, 2025

Conversation

lunika
Copy link
Member

@lunika lunika commented Mar 28, 2025

Purpose

We recently extract images url in the content. For this, we assume that the document content is always in base64. We enforce this assumptim by checking if it's a valide base64 in the serializer.

Proposal

  • 🐛(back) validate document content in serializer

@lunika lunika added the bug Something isn't working label Mar 28, 2025
@lunika lunika requested review from sampaccoud and Copilot March 28, 2025 17:18
@lunika lunika self-assigned this Mar 28, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR aims to enforce that document content is always in valid base64 format by adding a new serializer validation and corresponding tests.

  • Added a new test case to validate that non-base64 content is rejected.
  • Implemented validate_content in the serializer to check for valid base64 using b64decode.
  • Updated the CHANGELOG with information about the bug fix.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
src/backend/core/tests/documents/test_api_documents_update.py Added test to ensure documents with invalid base64 content raise a validation error.
src/backend/core/api/serializers.py Added validation logic for the content field to enforce base64 encoding.
CHANGELOG.md Updated changelog to reflect the bug fix.
Comments suppressed due to low confidence (1)

src/backend/core/api/serializers.py:309

  • Consider using b64decode(value, validate=True) to enforce strict base64 validation and prevent potentially accepting invalid characters.
b64decode(value)

@lunika lunika requested a review from Copilot March 28, 2025 17:28
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds validation for document content to ensure it is correctly encoded in base64 before processing.

  • Adds a new test to verify that a document update with invalid base64 content is rejected
  • Implements a new validator in the serializer to enforce valid base64 encoding
  • Updates the CHANGELOG to reference the fix

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/backend/core/tests/documents/test_api_documents_update.py Adds a test case for invalid base64 document content
src/backend/core/api/serializers.py Introduces a validate_content method to check base64 content
CHANGELOG.md Updates changelog with the bug fix entry

@lunika lunika requested a review from Copilot March 28, 2025 17:33
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds server-side validation for document content by ensuring that input strings are valid base64 encoded data. Key changes include:

  • Adding a new serializer method (validate_content) in the API serializer to enforce base64 validation.
  • Creating a corresponding test (test_api_documents_update_invalid_content) to confirm that invalid content returns a 400 error with the proper message.
  • Updating the changelog to document the fix.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/backend/core/tests/documents/test_api_documents_update.py Added test to check behavior for non-base64 encoded content
src/backend/core/api/serializers.py Introduced validate_content method to enforce base64 content format
CHANGELOG.md Updated changelog to document the base64 validation fix

@lunika lunika force-pushed the fix/validate_content branch 3 times, most recently from b86fb52 to bbfcaf8 Compare March 29, 2025 06:30
@lunika lunika requested a review from Copilot March 29, 2025 06:32
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements server-side validation for document content to ensure that it is valid base64-encoded data and includes an associated test case and changelog updates.

  • Add a new serializer method (validate_content) to check base64 validity
  • Introduce a test case to trigger the validation error when non-base64 content is provided
  • Update the changelog to reflect the fix

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
src/backend/core/tests/documents/test_api_documents_update.py Added a test case for invalid base64 document content
src/backend/core/api/serializers.py Added validate_content method to enforce base64 validation on document content
CHANGELOG.md Updated changelog with the base64 validation fix

We recently extract images url in the content. For this, we assume that
the document content is always in base64. We enforce this assumption by
checking if it's a valide base64 in the serializer.
@lunika lunika force-pushed the fix/validate_content branch from bbfcaf8 to 187dbf8 Compare March 29, 2025 06:57
@lunika lunika merged commit fbe8a26 into main Mar 29, 2025
19 checks passed
@lunika lunika deleted the fix/validate_content branch March 29, 2025 18:08
@AntoLC AntoLC mentioned this pull request Apr 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants