Skip to content

threading overhead in free-threaded Python #118153

Closed as not planned
Closed as not planned
@szalpal

Description

@szalpal

Bug report

Bug description:

Hi :)

I'm testing the free-threaded Python build. I'm running a simple test (code below), which triggers a computationally heavy function across CPU cores using threading module. Time measurements of the script are the following:

num_threads = 2  -> 0.68 s
num_threads = 8  -> 5.48 s
num_threads = 18 -> 12.1 s

In ideal world, I believe I could expect the three numbers above to be the same (or comparable). I've also gathered the profiles of the experiment:

num_threads = 2
image

num_threads = 8
image

num_threads = 18 (showing only some threads, but the picture illustrates the issue)
image

As we can see, the CPU utilisation decreases with the number of CPU threads used (almost 99% for nt=2, about 75% for nt=8 and ~40% for nt=18). We also see increased CPU core switching frequency. My guess is that the reason of decreased CPU utilisation is the overhead on the threading module.

Therefore I'd have a question - would this be correct? I've went through PEP 703, but I've seen no mention about this part. If the overhead on threading is the root cause of lowered utilisation, may this issue be addressed?

@colesbury , tagging you here since I believe you'd know most about the free-threaded Python build.

Testing configuration:

Ubuntu 22.04
Python 3.13 ToT
scaling_governor - performance

CPU(s):                  36
  On-line CPU(s) list:   0-35
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz

CPython build command:

./configure --disable-gil --enable-optimizations && make -j && make install

Testing script:

import math
import time
import nvtx
import threading

def computational_heavy(iterations):
    val = 0
    for i in range(1, iterations):
        val += math.sin(i) * math.cos(i)
    return val


def test(thread_id, iterations=1000000):
    with nvtx.annotate("Calculation"):
        computational_heavy(iterations)

num_threads = 18

threads = [
    threading.Thread(target=test, name=f"Thread{i}", args=(i,))
    for i in range(num_threads)
]
start = time.perf_counter_ns()
for t in threads:
    t.start()
for t in threads:
    t.join()
stop = time.perf_counter_ns()
print(f"Elapsed time {stop-start} ns")

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    type-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions