Closed
Description
It's pretty easy to starve a task that wants to create a HashMap by having a busy task somewhere else. This appears to be caused by HashMap's need for some random numbers.
Slapping #[no_uv] on the crate makes it better, but not a great solution at this point due to the other resulting limitations.
Example code w/ some explanation at https://gist.github.com/jfager/8072694.
I'm on OSX 10.8.5, using rustc built from b3cee62.
Metadata
Metadata
Assignees
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
brson commentedon Dec 22, 2013
I can reproduce this but don't understand yet why there would be a difference in behavior based on the data structures constructed.
I can imagine scenarios where the spinning would starve other tasks, but can't picture how the rng could be affected by a task spinning.
brson commentedon Dec 22, 2013
Running with just one thread produces the following
Indicating that the hashmap tasks get descheduled at some point in favor of the spinning task. That indicates to me that thieves are failing to find the hashmap tasks. I remember thinking recently that the stealing code was a little racy and that was ok since it doesn't impact the actual correctness (the task does run eventually), but maybe we can tighten it up so that thieves always find work if it's available.
I'd like to know why the task rng causes tasks to be descheduled though....
brson commentedon Dec 22, 2013
The task rng deschedules because it seeds via io of course.
brson commentedon Dec 22, 2013
Right now if a scheduler fails to steal it pushes itself to the sleeper list and waits for another scheduler to notify it that there is work available. I'm guessing the window between those two events is where the hashmap tasks are descheduling. Closing that window is a little tricky, but I can imagine we might: steal, push to sleeper list, steal again, sleep.
brson commentedon Dec 22, 2013
Oh, my previous guess my be wrong.
If these tasks are descheduling to do I/O there may be nothing we can do for them. They can't run again until the local scheduler becomes available and the I/O event loop resumes.
alexcrichton commentedon Dec 22, 2013
I think that @brson's analysis of the problem is correct. What happens is that one I/O loop has lots of "ready events", but the current task on that I/O loop is not yielding control back to the scheduler. I'm unsure if there's much that we can do about this, we'd have to interrupt the task to allow the I/O loop to wake up and do its business, but currently preemption is not possible (and also would break lots of code today), so this is definitely a tricky problem.
metajack commentedon Dec 22, 2013
Is it not possible to transfer tasks sleeping on io to a different scheduler? This failure pattern is not going to be super easy to reason about.
alexcrichton commentedon Dec 22, 2013
Sadly no, the problem here is that the libuv event loop needs to run in order to execute the callbacks of the pending I/O handles. The callbacks are what will reawaken the tasks and allow them to get stolen to other schedulers, but there's no way to have a separate event loop run the callback of another event loop.
metajack commentedon Dec 22, 2013
I could have sworn there was talk about migrating descriptors across event loops in the distant past. I figured we just hadn't gotten around to it and such a thing would eventually address this problem.
thestinger commentedon Sep 19, 2014
#17325 means this is no longer relevant to the standard libraries