-
Notifications
You must be signed in to change notification settings - Fork 65
Memory optimizations #90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -54,6 +54,9 @@ def new_buffer | |||
String.new(encoding: Encoding::BINARY, capacity: 127) | |||
end | |||
|
|||
SIZE_TO_STRING = Hash.new { |h, k| h[k] = k.to_s } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking if this is actually threadsafe? If not, we can just prepopulate an array of size, for example, 100 and use it instead of this hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, thread safe can mean a lot of things, but on MRI it's acceptably safe (may generate some extra strings in case of race but they'll be GC). On Truffle or JRuby however I believe you may get problems.
But either way I don't think this is a good idea.
- That has can grow unbounded and will never be shrunk or reclaimed. That is basically a memory leak, and that hash will have to be marked regularly, slowing down GC pauses.
- These small strings are entirely embeded, have no reference, an never held onto. So from a GC perspective they take very little time.
Generally speaking allocations aren't necessarily a problem, they might if they cause GC to trigger more often, and GC is slow for other reasons. But here I really don't think it's worth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use something like
SIZE_TO_STRING = (0..100).to_a.map(&:to_s)
def size_to_string(size)
SIZE_TO_STRING[size] || size.to_s
end
Assuming sizes most of the times should not be large numbers.
This saves 20Mb per 100k sidekiq jobs, but yes, your points are still valid about these micro optimizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The memory space metric has to be interpreted carefully. All these strings will be under the embeded string limit, so they'll all use just one object slot without any associated malloc. Which mean very little impact on GC performance.
Allocations (both memsize and object count) can be an interesting proxy for code performance, but it has to be carefully interpreted. Less allocations doesn't always mean faster. Allocating an embeded object is just a pointer bump, it's incredibly cheap. It would be costly if the string was larger and had to call malloc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't be surprised if the size_to_string
method call, plus the hash lookup wouldn't end up being slower than the embedded string allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require 'benchmark/ips'
SIZE_TO_STRING = (0..100).to_a.map(&:to_s)
def size_to_string(size)
SIZE_TO_STRING[size] || size.to_s
end
puts "== hit =="
Benchmark.ips do |x|
x.report("Integer#to_s") { 42.to_s }
x.report("size_to_string") { size_to_string(42) }
x.compare!(order: :baseline)
end
puts "== miss =="
Benchmark.ips do |x|
x.report("Integer#to_s") { 420.to_s }
x.report("size_to_string") { size_to_string(420) }
x.compare!(order: :baseline)
end
puts "minor_gc: #{GC.stat(:minor_gc_count)}"
Results:
$ RUBY_GC_HEAP_INIT_SLOTS=1000000 ruby -v /tmp/int_to_str.rb
ruby 3.2.0 (2022-12-25 revision a528908271) [arm64-darwin22]
RUBY_GC_HEAP_INIT_SLOTS=1000000 (default value: 10000)
== hit ==
Warming up --------------------------------------
Integer#to_s 1.314M i/100ms
size_to_string 1.431M i/100ms
Calculating -------------------------------------
Integer#to_s 13.236M (± 2.4%) i/s - 67.021M in 5.066487s
size_to_string 14.139M (± 2.0%) i/s - 71.528M in 5.061079s
Comparison:
Integer#to_s: 13236033.7 i/s
size_to_string: 14139247.5 i/s - 1.07x (± 0.00) faster
== miss ==
Warming up --------------------------------------
Integer#to_s 1.311M i/100ms
size_to_string 896.525k i/100ms
Calculating -------------------------------------
Integer#to_s 13.077M (± 2.7%) i/s - 65.554M in 5.016665s
size_to_string 8.820M (± 4.1%) i/s - 44.826M in 5.091855s
Comparison:
Integer#to_s: 13076807.3 i/s
size_to_string: 8820024.3 i/s - 1.48x (± 0.00) slower
minor_gc: 242
That's a very small gain on hit, bit a big loss on miss. I really don't think it's worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted this change.
Made some simple micro benchmark:
# frozen_string_literal: true
require "bundler/inline"
gemfile(true) do
source "https://rubygems.org"
git_source(:github) { |repo| "https://github.com/#{repo}.git" }
gem "benchmark-ips"
end
arr = (1..100).to_a + (101..120).to_a
SIZE_TO_STRING = (0..100).to_a.map(&:to_s)
def size_to_string(size)
SIZE_TO_STRING[size] || size.to_s
end
Benchmark.ips do |x|
x.report("to_s") do
arr.each(&:to_s)
end
x.report("cached") do
arr.each do |e|
size_to_string(e)
end
end
x.compare!
end
Warming up --------------------------------------
to_s 5.879k i/100ms
cached 7.666k i/100ms
Calculating -------------------------------------
to_s 58.378k (± 1.5%) i/s - 293.950k in 5.036388s
cached 76.132k (± 1.5%) i/s - 383.300k in 5.035706s
Comparison:
cached: 76132.4 i/s
to_s: 58378.1 i/s - 1.30x (± 0.00) slower
@buffer.clear | ||
RESP3.dump(command, @buffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So String#clear
here free the malloced
region: https://bugs.ruby-lang.org/issues/17790.
So this save allocating a string slot, but will malloc
anyway which will eventually trigger a GC.
If we want to be smart we want to re-used that malloced region, but have to be careful not to end up with a giant buffer, so we'd need to clear it if it past a certain size. Unfortunately the only way to get the size of the malloc
is via ObjectSpace.memsize_of(str)
. I'd need to check its performance first.
We also have to be super careful not to leak data, as that happened in the past with a similar optimization ruby/net-protocol#19
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to be smart we want to re-used that malloced region
Can we currently do this from ruby? I see the proposed patch into ruby is not merged yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we currently do this from ruby?
Well, if you pass the string to read
without clearing it, it will re-use that space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we are using it just for write in this case (write
and write_multi
), so this is not yet possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, sorry I missed we where passing it to RESP3.dump
. Yeah I see no solution here.
I think if we call clear
we start over from an empty string, so we lose the pre-allocation benefits.
@@ -77,6 +77,8 @@ def initialize(config, connect_timeout:, read_timeout:, write_timeout:) | |||
read_timeout: read_timeout, | |||
write_timeout: write_timeout, | |||
) | |||
|
|||
@buffer = String.new(encoding: Encoding::BINARY, capacity: 127) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That 127
is quite arbitrary, would be worth putting some thoughts into it, or making it configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was taken from the buffer definition it already uses -
redis-client/lib/redis_client/ruby_connection/resp3.rb
Lines 53 to 55 in aa5a308
def new_buffer | |
String.new(encoding: Encoding::BINARY, capacity: 127) | |
end |
@fatkodima do summarize I don't think For the buffer re-use, it might make sense but I'll have to carefully review it. |
12d4437
to
0ff376d
Compare
Feel free to close if it is not worth it. And thank you for your time giving a 💪 review, as always! |
Yeah, I don't think either of these opts are a clear enough cut, so I'll close. Also for people for which this matter, they can use Thanks for trying to improve redis-client! |
As part of sidekiq/sidekiq#5768, I noticed that this gem allocates quite some memory and was able to reduce it within this PR.
I ran a little tweaked
sidekiq
's benchmark on ruby 3.2.0:Before
After