Description
Feature or enhancement
The Python/dtoa.c
library is responsible for formatting floating point numbers (double
) as strings and for parsing strings into numbers. The shared state is not thread-safe without the GIL:
Balloc
andBfree
use a per-interpreter free-list to avoid some allocations ofBigint
objectspow5mult
uses a per-interpreter append-only list ofBigint
powers of 5
For (1), we can just skip using the freelists in --disable-gil
builds. We already have a code path (Py_USING_MEMORY_DEBUGGER
) that doesn't use freelists.
Line 312 in d61313b
For (2), we can use atomic operations to append to the powers-of-5 linked list in a thread-safe manner. I don't think this needs to be guarded by a Py_NOGIL
checks, since each power-of-5 is only ever created once.
For context, here is the modification to Python/dtoa.c
in nogil-3.12. Note that it uses a PyMutex
for (2), which I think we can avoid.
dragonbox, Ryū, Grisu, Schubfach
In the past 5 or so years, there have been a number of faster float-to-string algorithms with the desirable attributes (correctly rounded, no "garabage" digits, etc.). To my knowledge they are also all thread-safe. "dragonbox" looks the most promising, but porting it to C is a bigger undertaking than making the existing dtoa.c
thread-safe.
Activity
ericvsmith commentedon Nov 11, 2023
@mdickinson for awareness.
mdickinson commentedon Nov 12, 2023
Thanks, @ericvsmith.
@colesbury
I'm not opposed to moving away from dtoa.c (quite the opposite), but it's important to recognise just how much dtoa.c is doing for us right now - not just float-to-string in all its wonderful variants (shortest string, correctly-rounded fixed precision including the case of negative precision, correctly-rounded fixed number of significant digits) but correctly-rounded string-to-float, too. I like the look of dragonbox, but it only covers one subcase (shortest string) of one direction (float-to-string). We'd need to find substitutes for all the various pieces of dtoa.c.
mdickinson commentedon Nov 12, 2023
The suggested mitigations for (1) and (2) sound good to me, FWIW.
mdickinson commentedon Nov 12, 2023
I guess we should probably open a separate issue if we want to explore the idea of moving away from
dtoa.c
entirely. Dragonbox looks good for shortest-string float-to-str. Daniel Lemire's fast_double_parser might do in the other direction for correctly-rounded str-to-float, but I haven't looked at it in detail and I don't know how battle-tested it is. The main speed priority would be for numeric strings with 17 or fewer significant decimal digits (since that's the max number of digits that the shortest-string algorithm will produce); it seems fine to me if a slower algorithm is used for larger numbers of significant digits. That leaves support fore
-style formatting,f
-style formatting, and two-argumentround
(including the case where the second argument toround
is negative). Again it would seem fine to have slower fallback options fore
andf
-style formatting with stupid numbers of digits, since that's a niche use-case, but it would be a shame to lose Python's correct rounding here. E.g.,format(0.1, '.55f')
should continue to give'0.1000000000000000055511151231257827021181583404541015625'
rather than producing something misleading like0.1000000000000000055500000000000000000000000000000000000
(which is what some other implementations/languages do).mdickinson commentedon Nov 12, 2023
Scratch that; the README says "We encourage users to adopt fast_float library instead".
colesbury commentedon Nov 13, 2023
@mdickinson, thanks for the context. I'll plan on only making minimal changes to dtoa to make it thread-safe.
pythongh-111962: Make dtoa thread-safe in `--disable-gil` builds.
pythongh-111962: Make dtoa thread-safe in `--disable-gil` builds.
--disable-gil
builds. #112049pythongh-111962: Make dtoa thread-safe in `--disable-gil` builds.
gh-111962: Make dtoa thread-safe in `--disable-gil` builds. (#112049)
pythongh-111962: Make dtoa thread-safe in `--disable-gil` builds. (py…
pythongh-111962: Make dtoa thread-safe in `--disable-gil` builds. (py…