Skip to content

API pattern for “setdefault”-like operations #40

Open
@encukou

Description

@encukou

IMO, we need an API pattern for “setdefault”-like operations, which would be safer under nogil.

I was thinking about these proposed new APIs:

The first case is like dict.get. IMO, it's good to return 0 and set *result to NULL if the item is not found. If the caller needs a default value, they can create it when needed. However, if they then want to insert the default back into the dict, in nogil they run into a race condition: the item might have already been inserted.

The second case is like dict.setdefault, doing some non-trivial work if the item is not found (in this case, creating a new module and putting it in sys.modules). That should be atomic, so it makes sense to me to use a default -- either take it as an argument, or us an implied empty value. Taking a default argument is not free, though: the caller needs to create it even if it's unused.

The trouble with PyImport_ImportOrAddModule (and other “setdefault”-style API with implied defaults) is that the newly created default sometimes needs further customization, and that will not be atomic under nogil: other threads have a chance to see a uninitialized module.

IMO, we need an API pattern for “setdefault”-like operations that allows creating a custom new value only if needed, and either

  • locks the container while that value is being constructed (which sounds like a deadlock magnet), or
  • handles the race by discarding a losing thread's newly made default:
# rough pseudocode
value = get(key)
if value is NULL:
    new = create_a_default()
    value = setdefault(key, new)   # note that `value == new` is not guaranteed
    ddecref(new)

Note that this get+setdefault is different from get+set (which we use in a bunch of places, for example in https://github.com/python/cpython/blob/d2f305dfd183025a95592319b280fcf4b20c8694/Python/pylifecycle.c#L2276-L2285 -- didn't check if that instance is safe).
With get+set, if there's a race, the first thread's value is discarded -- but that value might already have been retrieved by other code.

cc @colesbury

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions