Skip to content

Incremental runs are worse than full runs #1721

Open
@krausest

Description

@krausest
Owner

There are two kind of runs:

  • Full runs. For a new chrome version I'm running all frameworks on my machine. The first night only the keyed implementations and when that looks good the second night the non-keyed implementations. Before running the keyed implementations I'm rebooting and stopping all the helpers (BetterDisplay, OneDrive, Karabiner Elements), stopping the search index (sudo mdutil -i off /), and making sure that there are no active apps running on the machine (other browser instances, VSCode etc.)
  • Incremental runs. For updated implementations I'm running vanillajs and the updated implementations. I'm stopping all active applications, but neither rebooting nor stopping the helper apps. vanillajs serves as a reference whether the achievable performance looks right. Sometimes it doesn't. It's often the case that results are worse than in a full run.

This issue tracks the impact of the measures for the full run. Maybe we can work out what's causing the effect.

Activity

krausest

krausest commented on Jul 28, 2024

@krausest
OwnerAuthor

For the first update since chrome 127 I decided to prepare the machine like for a full run. Results are pretty close and show how much noise is in the measurement.
Left incremental run, right last official run:
Screenshot 2024-07-28 at 09 05 53

krausest

krausest commented on Jul 28, 2024

@krausest
OwnerAuthor

With indexing enabled and on battery, every else like like in a full run. A bit slower!
Screenshot 2024-07-28 at 12 29 23

krausest

krausest commented on Jul 28, 2024

@krausest
OwnerAuthor

Once again, with indexing disabled and on AC:
Screenshot 2024-07-28 at 13 10 10
Really hard to tell if that's better than the last one. Nevertheless the full run was slightly better...

krausest

krausest commented on Aug 2, 2024

@krausest
OwnerAuthor

Just running create 1k rows for vanillajs, but with 40 iterations for the following cases:

  1. Firefox, Chrome, Mail, Notes, Onedrive, Betterdisplay, VSCode, Preview, Settings
  2. Firefox, Chrome, Mail, Notes, Onedrive, Betterdisplay, Preview, Settings
  3. Only: Onedrive, Betterdisplay
  4. No Apps running, Onedrive stopped, Betterdisplay stopped
  5. After Rebooting: No Apps running, everything stopped except indexing
  6. After Reboot: No Apps running, everything stopped incl. indexing

Screenshot 2024-08-02 at 20 41 13

The results aren't as clear as I hoped.

krausest

krausest commented on Aug 2, 2024

@krausest
OwnerAuthor

Huh. There was one case missing. Running on AC, all above was on battery

  1. Battery: Firefox, Chrome, Mail, Notes, Onedrive, Betterdisplay, VSCode, Preview, Settings
  2. Battery: Firefox, Chrome, Mail, Notes, Onedrive, Betterdisplay, Preview, Settings
  3. Battery: Only: Onedrive, Betterdisplay
  4. Battery: No Apps running, Onedrive stopped, Betterdisplay stopped
  5. Battery: After Rebooting: No Apps running, everything stopped except indexing
  6. On AC: After Rebooting: No Apps running, everything stopped except indexing
  7. Battery: After Reboot: No Apps running, everything stopped incl. indexing
    Screenshot 2024-08-02 at 20 53 06

So, running on AC instead of on battery is the key. I really thought the mac was immune to that.
Full runs were always on battery, incrementals runs were often on battery.

robbiespeed

robbiespeed commented on Aug 11, 2024

@robbiespeed
Contributor

I wonder if there's any variation caused from some incremental runs executing on efficiency cores. I know I've seen it a fair amount on Linux with my intel cpus when running benchmarks, so I use something like taskset -c 0-7 program [args] to set affinity to only performance cores.

The closest ability I found for macos is taskpolicy -c utility program [args], from what I've read that should favour p-cores, but still has a chance to run on e-cores. From a clean boot with no other programs it'll likely run on p-cores anyway, so I'd expect full runs to be safe.

Or maybe no runs are happening on the e-cores anyway and mac being a holistic platform just has really good defaults for cpu affinity handling out of the box.

krausest

krausest commented on Sep 26, 2024

@krausest
OwnerAuthor

Just a note for myself.
3 reboots with 3 runs each for vanillajs create 1k rows (i.e. 9 runs):
Screenshot 2024-09-26 at 21 30 14
the mean is shown as the label on the x axis

Pretty close and well below the 37.7 we observed in the last incremental run (despite also rebooting then).

krausest

krausest commented on Nov 2, 2024

@krausest
OwnerAuthor

I'm really annoyed by this issue.
Yesterday I did a pretty clean incremental run (reboot, wait until background jobs stop consuming CPU), but results for sonnet and vanillajs were slower than in the full run.
So I decided to run the updated frameworks again, but with the current setup for full runs (reboot, wait until all those bg jobs stopped using CPU, end all helpers like 1password, bettertouch, betterdisplay, onedrive, stop indexing, disable wifi and bluebooth).
I got pretty much the same results as yesterday, all frameworks performed within 0.1 which is the usual noise (which is good), but still different to the last full run (the code for vanillajs and vanillajs-3 didn't change, chrome didn't change, so I lack to find many reasons except OSX updates).
Screenshot 2024-11-02 at 12 49 54
In the last full run vanillajs scored 1.05 and vanillajs-3 1.04, sonnet 1.03, so it's only within a 0.3 range, which is in my opinion too big...

syduki

syduki commented on Nov 13, 2024

@syduki
Contributor

Actually, OSX + Apple CPU is not the best environment for performance tuning, it is very unlikely that fiddling with charger or wifi will solve this issue. A Linux + AMD/Intel device would have at least a proper CPU frequency lock...

A increase of sampling time may help, at least it seems that this issue became evident recently so that may be due to reducing/removing the throttling in tests.

krausest

krausest commented on Apr 6, 2025

@krausest
OwnerAuthor

One new idea. I'm always piping the logs into a file for a full run. For incremental runs I almost always let logging print on the console. In #1855 the difference between both runs is the redirecting of the logging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @robbiespeed@krausest@syduki

        Issue actions

          Incremental runs are worse than full runs · Issue #1721 · krausest/js-framework-benchmark