Skip to content

Memory optimizations #90

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

fatkodima
Copy link

As part of sidekiq/sidekiq#5768, I noticed that this gem allocates quite some memory and was able to reduce it within this PR.

I ran a little tweaked sidekiq's benchmark on ruby 3.2.0:

diff --git a/bin/sidekiqload b/bin/sidekiqload
index 8159d424..cf52e78c 100755
--- a/bin/sidekiqload
+++ b/bin/sidekiqload
@@ -57,7 +57,7 @@ end

 class Loader
   def initialize
-    @iter = ENV["GC"] ? 10 : 500
+    @iter = ENV["GC"] ? 10 : 100
     @count = Integer(ENV["COUNT"] || 1_000)
     @latency = Integer(ENV["LATENCY"] || 1)
   end
@@ -130,7 +130,7 @@ class Loader

   def monitor
     @monitor = Thread.new do
-      GC.start
+      # GC.start
       loop do
         sleep 0.2
         qsize = Sidekiq.redis do |conn|
$ THREADS=1 LATENCY=0 PROFILE=1 bundle exec ruby-memory-profiler --scale-bytes --out tmp/memory_profiler.txt bin/sidekiqload

Before

Total allocated: 697.19 MB (6431073 objects)
Total retained:  2.57 MB (22652 objects)

allocated memory by gem
-----------------------------------
 231.40 MB  json-2.6.2
 194.04 MB  sidekiq/lib
 175.63 MB  redis-client-0.12.1
  50.74 MB  connection_pool-2.3.0
  22.96 MB  other
   9.69 MB  lib
   3.32 MB  concurrent-ruby-1.1.10
   2.64 MB  ruby-prof-1.4.5
   1.78 MB  activesupport-7.0.4.2
   1.67 MB  activerecord-7.0.4.2
   1.18 MB  rake-13.0.6
 858.80 kB  yard-0.9.28
 398.82 kB  sqlite3-1.6.0-x86_64-darwin
 376.38 kB  activemodel-7.0.4.2
 322.39 kB  i18n-1.12.0
 110.02 kB  toxiproxy-2.0.2
  57.38 kB  after_commit_everywhere-1.3.0
  10.78 kB  bundler-2.3.22
   4.02 kB  rubygems

allocated memory by file
-----------------------------------
 231.28 MB  /Users/fatkodima/.asdf/installs/ruby/3.2.0/lib/ruby/gems/3.2.0/gems/json-2.6.2/lib/json/common.rb
 106.24 MB  /Users/fatkodima/.asdf/installs/ruby/3.2.0/lib/ruby/gems/3.2.0/gems/redis-client-0.12.1/lib/redis_client/ruby_connection/resp3.rb
  60.06 MB  /Users/fatkodima/Desktop/oss/sidekiq/lib/sidekiq/processor.rb
  48.08 MB  /Users/fatkodima/.asdf/installs/ruby/3.2.0/lib/ruby/gems/3.2.0/gems/redis-client-0.12.1/lib/redis_client/ruby_connection/buffered_io.rb
  44.91 MB  /Users/fatkodima/Desktop/oss/sidekiq/lib/sidekiq/client.rb
  42.63 MB  /Users/fatkodima/.asdf/installs/ruby/3.2.0/lib/ruby/gems/3.2.0/gems/connection_pool-2.3.0/lib/connection_pool.rb
...

After

Total allocated: 600.07 MB (5626286 objects)
Total retained:  2.57 MB (22682 objects)

allocated memory by gem
-----------------------------------
 231.46 MB  json-2.6.2
 194.04 MB  sidekiq/lib
  78.64 MB  redis-client/lib
  50.74 MB  connection_pool-2.3.0
  22.96 MB  other
   9.69 MB  lib
   3.32 MB  concurrent-ruby-1.1.10
   2.44 MB  ruby-prof-1.4.5
   1.78 MB  activesupport-7.0.4.2
   1.67 MB  activerecord-7.0.4.2
   1.18 MB  rake-13.0.6
 858.80 kB  yard-0.9.28
 398.82 kB  sqlite3-1.6.0-x86_64-darwin
 376.28 kB  activemodel-7.0.4.2
 322.80 kB  i18n-1.12.0
 110.02 kB  toxiproxy-2.0.2
  57.47 kB  after_commit_everywhere-1.3.0
  10.76 kB  bundler-2.3.22
   4.02 kB  rubygems

allocated memory by file
-----------------------------------
 231.34 MB  /Users/fatkodima/.asdf/installs/ruby/3.2.0/lib/ruby/gems/3.2.0/gems/json-2.6.2/lib/json/common.rb
  60.06 MB  /Users/fatkodima/Desktop/oss/sidekiq/lib/sidekiq/processor.rb
  48.08 MB  /Users/fatkodima/Desktop/oss/redis-client/lib/redis_client/ruby_connection/buffered_io.rb
  44.91 MB  /Users/fatkodima/Desktop/oss/sidekiq/lib/sidekiq/client.rb
  42.63 MB  /Users/fatkodima/.asdf/installs/ruby/3.2.0/lib/ruby/gems/3.2.0/gems/connection_pool-2.3.0/lib/connection_pool.rb
  39.23 MB  /Users/fatkodima/Desktop/oss/sidekiq/lib/sidekiq/fetch.rb
  24.04 MB  /Users/fatkodima/Desktop/oss/sidekiq/lib/sidekiq/middleware/chain.rb
  16.82 MB  /Users/fatkodima/Desktop/oss/sidekiq/lib/sidekiq/job_logger.rb
  12.06 MB  /Users/fatkodima/Desktop/oss/redis-client/lib/redis_client/decorator.rb
   9.27 MB  /Users/fatkodima/Desktop/oss/redis-client/lib/redis_client/ruby_connection/resp3.rb
   8.88 MB  /Users/fatkodima/Desktop/oss/redis-client/lib/redis_client/command_builder.rb
....

@@ -54,6 +54,9 @@ def new_buffer
String.new(encoding: Encoding::BINARY, capacity: 127)
end

SIZE_TO_STRING = Hash.new { |h, k| h[k] = k.to_s }
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking if this is actually threadsafe? If not, we can just prepopulate an array of size, for example, 100 and use it instead of this hash.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, thread safe can mean a lot of things, but on MRI it's acceptably safe (may generate some extra strings in case of race but they'll be GC). On Truffle or JRuby however I believe you may get problems.

But either way I don't think this is a good idea.

  • That has can grow unbounded and will never be shrunk or reclaimed. That is basically a memory leak, and that hash will have to be marked regularly, slowing down GC pauses.
  • These small strings are entirely embeded, have no reference, an never held onto. So from a GC perspective they take very little time.

Generally speaking allocations aren't necessarily a problem, they might if they cause GC to trigger more often, and GC is slow for other reasons. But here I really don't think it's worth.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use something like

SIZE_TO_STRING = (0..100).to_a.map(&:to_s)
def size_to_string(size)
  SIZE_TO_STRING[size] || size.to_s
end

Assuming sizes most of the times should not be large numbers.

This saves 20Mb per 100k sidekiq jobs, but yes, your points are still valid about these micro optimizations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memory space metric has to be interpreted carefully. All these strings will be under the embeded string limit, so they'll all use just one object slot without any associated malloc. Which mean very little impact on GC performance.

Allocations (both memsize and object count) can be an interesting proxy for code performance, but it has to be carefully interpreted. Less allocations doesn't always mean faster. Allocating an embeded object is just a pointer bump, it's incredibly cheap. It would be costly if the string was larger and had to call malloc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't be surprised if the size_to_string method call, plus the hash lookup wouldn't end up being slower than the embedded string allocation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

require 'benchmark/ips'

SIZE_TO_STRING = (0..100).to_a.map(&:to_s)
def size_to_string(size)
  SIZE_TO_STRING[size] || size.to_s
end

puts "== hit =="
Benchmark.ips do |x|
  x.report("Integer#to_s") { 42.to_s }
  x.report("size_to_string") { size_to_string(42) }
  x.compare!(order: :baseline)
end

puts "== miss =="
Benchmark.ips do |x|
  x.report("Integer#to_s") { 420.to_s }
  x.report("size_to_string") { size_to_string(420) }
  x.compare!(order: :baseline)
end

puts "minor_gc: #{GC.stat(:minor_gc_count)}"

Results:

$ RUBY_GC_HEAP_INIT_SLOTS=1000000 ruby -v /tmp/int_to_str.rb 
ruby 3.2.0 (2022-12-25 revision a528908271) [arm64-darwin22]
RUBY_GC_HEAP_INIT_SLOTS=1000000 (default value: 10000)
== hit ==
Warming up --------------------------------------
        Integer#to_s     1.314M i/100ms
      size_to_string     1.431M i/100ms
Calculating -------------------------------------
        Integer#to_s     13.236M (± 2.4%) i/s -     67.021M in   5.066487s
      size_to_string     14.139M (± 2.0%) i/s -     71.528M in   5.061079s

Comparison:
        Integer#to_s: 13236033.7 i/s
      size_to_string: 14139247.5 i/s - 1.07x  (± 0.00) faster

== miss ==
Warming up --------------------------------------
        Integer#to_s     1.311M i/100ms
      size_to_string   896.525k i/100ms
Calculating -------------------------------------
        Integer#to_s     13.077M (± 2.7%) i/s -     65.554M in   5.016665s
      size_to_string      8.820M (± 4.1%) i/s -     44.826M in   5.091855s

Comparison:
        Integer#to_s: 13076807.3 i/s
      size_to_string:  8820024.3 i/s - 1.48x  (± 0.00) slower

minor_gc: 242

That's a very small gain on hit, bit a big loss on miss. I really don't think it's worth it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted this change.

Made some simple micro benchmark:

# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  git_source(:github) { |repo| "https://github.com/#{repo}.git" }

  gem "benchmark-ips"
end

arr = (1..100).to_a + (101..120).to_a

SIZE_TO_STRING = (0..100).to_a.map(&:to_s)
def size_to_string(size)
  SIZE_TO_STRING[size] || size.to_s
end

Benchmark.ips do |x|
  x.report("to_s") do
    arr.each(&:to_s)
  end

  x.report("cached") do
    arr.each do |e|
      size_to_string(e)
    end
  end

  x.compare!
end
Warming up --------------------------------------
                to_s     5.879k i/100ms
              cached     7.666k i/100ms
Calculating -------------------------------------
                to_s     58.378k (± 1.5%) i/s -    293.950k in   5.036388s
              cached     76.132k (± 1.5%) i/s -    383.300k in   5.035706s

Comparison:
              cached:    76132.4 i/s
                to_s:    58378.1 i/s - 1.30x  (± 0.00) slower

Comment on lines +103 to +104
@buffer.clear
RESP3.dump(command, @buffer)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So String#clear here free the malloced region: https://bugs.ruby-lang.org/issues/17790.

So this save allocating a string slot, but will malloc anyway which will eventually trigger a GC.

If we want to be smart we want to re-used that malloced region, but have to be careful not to end up with a giant buffer, so we'd need to clear it if it past a certain size. Unfortunately the only way to get the size of the malloc is via ObjectSpace.memsize_of(str). I'd need to check its performance first.

We also have to be super careful not to leak data, as that happened in the past with a similar optimization ruby/net-protocol#19

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to be smart we want to re-used that malloced region

Can we currently do this from ruby? I see the proposed patch into ruby is not merged yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we currently do this from ruby?

Well, if you pass the string to read without clearing it, it will re-use that space.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we are using it just for write in this case (write and write_multi), so this is not yet possible?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, sorry I missed we where passing it to RESP3.dump. Yeah I see no solution here.

I think if we call clear we start over from an empty string, so we lose the pre-allocation benefits.

@@ -77,6 +77,8 @@ def initialize(config, connect_timeout:, read_timeout:, write_timeout:)
read_timeout: read_timeout,
write_timeout: write_timeout,
)

@buffer = String.new(encoding: Encoding::BINARY, capacity: 127)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That 127 is quite arbitrary, would be worth putting some thoughts into it, or making it configurable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was taken from the buffer definition it already uses -

def new_buffer
String.new(encoding: Encoding::BINARY, capacity: 127)
end

@casperisfine
Copy link
Collaborator

@fatkodima do summarize I don't think SIZE_TO_STRING is a good idea, so please revert it.

For the buffer re-use, it might make sense but I'll have to carefully review it.

@fatkodima fatkodima force-pushed the memory-optimizations branch from 12d4437 to 0ff376d Compare February 8, 2023 11:17
@fatkodima
Copy link
Author

Feel free to close if it is not worth it. And thank you for your time giving a 💪 review, as always!

@byroot
Copy link
Member

byroot commented Feb 8, 2023

Yeah, I don't think either of these opts are a clear enough cut, so I'll close.

Also for people for which this matter, they can use hiredis-client which will save more memory than what we can do on the Ruby side.

Thanks for trying to improve redis-client!

@byroot byroot closed this Feb 8, 2023
@fatkodima fatkodima deleted the memory-optimizations branch February 8, 2023 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants