Skip to content

Extended ASCII characters in multiline strings cause "SystemError: Negative size passed to PyUnicode_New" when the encoding is not specified #96611

Closed
@polprog

Description

@polprog

Bug report

In some cases, when dealing with multi-line string with non-utf8 encoded files, python will throw a SystemError: Negative size passed to PyUnicode_New and not execute any code.

Minimal test case:

print("""
ą""")

This is only a problem if the non-utf8 character lies on a new line (at any point in the line)

A similar test case behaves correctly

print("""ą""")

And reports an encoding warning, which is the expected behavior

SyntaxError: Non-UTF-8 code starting with '\xb1' in file C:\Users\xxxxx\test.py on line 2, but no encoding declared; see https://python.org/dev/peps/pep-0263/ for details

Since this is an encoding related errors, both files are attached (as .txt, GitHub does not allow .py attachments).
test.txt - single line (correct behavior)
test_ml.txt - multi line (bug)

My environment

  • CPython versions tested on: Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] on win32
  • Operating system and architecture: Windows 10 Pro 21H2 (19044.1826)

Metadata

Metadata

Assignees

Labels

interpreter-core(Objects, Python, Grammar, and Parser dirs)topic-unicodetype-bugAn unexpected behavior, bug, or error

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions