Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
Ruby's GC running during Rails requests can have negative impacts on currently running requests, causing applications to have high tail-latency.
A technique to mitigate this high tail-latency is Out-of-band GC (OOBGC). This is basically where the application is run with GC disabled, and then GC is explicitly started after each request, or when no requests are in progress.
This can reduce the tail latency, but also introduces problems of its own. Long GC pauses after each request reduce throughput. This is more pronounced on threading servers like Puma because all the threads have to finish processing user requests and be "paused" before OOBGC can be triggered.
This throughput decrease happens for a couple of reasons:
This ticket attempts to address these issues by:
GC.disable_major
and its antonymGC.enable_major
to disable and enable only major GCGC.needs_major?
as a basic heuristic allowing users to tell when Ruby should run a Major GC.These ideas were originally proposed by @ko1 and @byroot in this rails issue
Disabling GC major's would still allow minor GC's to run during the request, avoiding the ballooning memory usage caused by not running GC at all, and reducing the time that a major takes when we do run it, because the nursery objects have been cleaned up during the request already so there is less work for a major GC to do.
This can be used in combination with
GC.needs_major?
to selectively run an OOBGC only when necessaryImplementation
This PR adds 3 new methods to the
GC
moduleGC.disable_major
This prevents major GC's from running automatically. It does not restrict minors. When
objspace->rgengc.need_major_gc
is set and a GC is run, instead of running a major, new heap pages will be allocated and a minor run instead.objspace->rgengc.need_major_gc
will remain set until a major is manually run. If a major is not manually run then the process will eventually run out of memory.When major GC's are disabled, object promotion is disabled. That is, no objects will increment their ages during a minor GC. This is to attempt to minimise heap growth during the period between major GC's, by restricting the number of old-gen objects that will remain unconsidered by the GC until the next major.
When
GC.start
is run, then major GC's will be enabled, a GC triggered with the options passed toGC.start
, and thendisable_major
will be set to the state it was in beforeGC.start
was called.GC.enable_major
This simply unsets the bit preventing major GC's. This will revert the GC to normal generational behaviour. Everything behaves as default again.
GC.needs_major?
This exposes the value of
objspace->rgengc.need_major_gc
to the user level API. This is already exposed inGC.latest_gc_info[:need_major_by]
but I felt that a simpler interface would make this easier to use and result in more readable code. eg.Because object aging is disabled when majors are disabled it is recommended to use this in conjunction with
Process.warmup
, which will prepare the heap by running a major GC, compacting the heap, and promoting every remaining object to old-gen. This ensures that minor GC's are running over the smallets possible set of young objects whenGC.disable_major
is true.Benchmarks
We ran some tests in production on Shopify's core monolith over a weekend and found that:
Mean time spent in GC, as well as p99.9 and p99.99 GC times are all improved.
p99 GC time is slightly higher.
We're running far fewer OOBGC major GC's now that we have
GC.needs_major?
than we were before, and we believe that this is contributing to a slightly increased number of minor GC's. raising the p99 slightly.App response times are all improved
We see a 9% reduction in average and p99 response times when compared against standard GC (4% p99.9 and p99.99).
This drops slightly to an 8% reduction in average and p99 response times when compared against standard OOBGC (3.59 p99.9 and 4% p99.99)