Skip to content

dtoa: thread safety in --disable-gil builds #111962

Closed
@colesbury

Description

@colesbury
Contributor

Feature or enhancement

The Python/dtoa.c library is responsible for formatting floating point numbers (double) as strings and for parsing strings into numbers. The shared state is not thread-safe without the GIL:

  1. Balloc and Bfree use a per-interpreter free-list to avoid some allocations of Bigint objects
  2. pow5mult uses a per-interpreter append-only list of Bigint powers of 5

For (1), we can just skip using the freelists in --disable-gil builds. We already have a code path (Py_USING_MEMORY_DEBUGGER) that doesn't use freelists.

#ifndef Py_USING_MEMORY_DEBUGGER

For (2), we can use atomic operations to append to the powers-of-5 linked list in a thread-safe manner. I don't think this needs to be guarded by a Py_NOGIL checks, since each power-of-5 is only ever created once.

For context, here is the modification to Python/dtoa.c in nogil-3.12. Note that it uses a PyMutex for (2), which I think we can avoid.

dragonbox, Ryū, Grisu, Schubfach

In the past 5 or so years, there have been a number of faster float-to-string algorithms with the desirable attributes (correctly rounded, no "garabage" digits, etc.). To my knowledge they are also all thread-safe. "dragonbox" looks the most promising, but porting it to C is a bigger undertaking than making the existing dtoa.c thread-safe.

Linked PRs

Activity

ericvsmith

ericvsmith commented on Nov 11, 2023

@ericvsmith
Member

@mdickinson for awareness.

mdickinson

mdickinson commented on Nov 12, 2023

@mdickinson
Member

Thanks, @ericvsmith.

@colesbury

In the past 5 or so years, there have been a number of faster float-to-string algorithms [...]

I'm not opposed to moving away from dtoa.c (quite the opposite), but it's important to recognise just how much dtoa.c is doing for us right now - not just float-to-string in all its wonderful variants (shortest string, correctly-rounded fixed precision including the case of negative precision, correctly-rounded fixed number of significant digits) but correctly-rounded string-to-float, too. I like the look of dragonbox, but it only covers one subcase (shortest string) of one direction (float-to-string). We'd need to find substitutes for all the various pieces of dtoa.c.

mdickinson

mdickinson commented on Nov 12, 2023

@mdickinson
Member

The suggested mitigations for (1) and (2) sound good to me, FWIW.

mdickinson

mdickinson commented on Nov 12, 2023

@mdickinson
Member

I guess we should probably open a separate issue if we want to explore the idea of moving away from dtoa.c entirely. Dragonbox looks good for shortest-string float-to-str. Daniel Lemire's fast_double_parser might do in the other direction for correctly-rounded str-to-float, but I haven't looked at it in detail and I don't know how battle-tested it is. The main speed priority would be for numeric strings with 17 or fewer significant decimal digits (since that's the max number of digits that the shortest-string algorithm will produce); it seems fine to me if a slower algorithm is used for larger numbers of significant digits. That leaves support for e-style formatting, f-style formatting, and two-argument round (including the case where the second argument to round is negative). Again it would seem fine to have slower fallback options for e and f-style formatting with stupid numbers of digits, since that's a niche use-case, but it would be a shame to lose Python's correct rounding here. E.g., format(0.1, '.55f') should continue to give '0.1000000000000000055511151231257827021181583404541015625' rather than producing something misleading like 0.1000000000000000055500000000000000000000000000000000000 (which is what some other implementations/languages do).

mdickinson

mdickinson commented on Nov 12, 2023

@mdickinson
Member

Daniel Lemire's fast_double_parser might do in the other direction

Scratch that; the README says "We encourage users to adopt fast_float library instead".

colesbury

colesbury commented on Nov 13, 2023

@colesbury
ContributorAuthor

@mdickinson, thanks for the context. I'll plan on only making minimal changes to dtoa to make it thread-safe.

added 2 commits that reference this issue on Nov 13, 2023

pythongh-111962: Make dtoa thread-safe in `--disable-gil` builds.

pythongh-111962: Make dtoa thread-safe in `--disable-gil` builds.

self-assigned this
on Nov 14, 2023
added a commit that references this issue on Nov 20, 2023

pythongh-111962: Make dtoa thread-safe in `--disable-gil` builds.

added a commit that references this issue on Dec 7, 2023

gh-111962: Make dtoa thread-safe in `--disable-gil` builds. (#112049)

2d76be2
added a commit that references this issue on Feb 11, 2024

pythongh-111962: Make dtoa thread-safe in `--disable-gil` builds. (py…

added a commit that references this issue on Sep 2, 2024

pythongh-111962: Make dtoa thread-safe in `--disable-gil` builds. (py…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @ericvsmith@colesbury@mdickinson

      Issue actions

        dtoa: thread safety in `--disable-gil` builds · Issue #111962 · python/cpython