-
-
Notifications
You must be signed in to change notification settings - Fork 907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Performance fixes in GameProfilingSystem and ChromeTracingProfileWriter #2050
base: master
Are you sure you want to change the base?
Conversation
…gProfileWriter - Fix busy-wait in `GameProfilingSystem` and make surrounding code clearer. - Add manual flushing to the JsonWriter, so the internal buffer doesn't grow indefinitely and put it on the fast path by skipping validation.
I don't think we have tests for the profiler, could you run those changes ? Also pinging @froce for review, let us know if you don't have time, I should have time to take a look in your stead. |
I was running the game (Starbreach) with those changes to validate the allocations are reduced (using allocation tool in VS). Didn't mark the checkbox for the editor as I didn't invoke it, but will give it a try later. |
internal static ProfilingEventChannel Create(UnboundedChannelOptions options) | ||
internal static ProfilingEventChannel Create(bool singleReader = false, bool singleWriter = false) | ||
{ | ||
var channel = Channel.CreateUnbounded<ProfilingEvent>(options); | ||
// bounded channel is supposed to have lower allocation overhead than unbounded | ||
var channel = Channel.CreateBounded<ProfilingEvent>(new BoundedChannelOptions(capacity: short.MaxValue) | ||
{ | ||
SingleReader = singleReader, | ||
SingleWriter = singleWriter, | ||
FullMode = BoundedChannelFullMode.DropWrite, // optimize caller to not block | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this. Allocation behavior for bounded and unbounded channels should be roughly the same. BoundedChannel grows a deque, UnboundedChannel grows a linked list of arrays. And they should both stabilize quickly after enabling the profiler(*). If you saw the channels constantly growing, that means the system couldn't keep up and we'd need to fix that. The tradeoff is that BoundedChannel is safe against that and we can notice dropped events, but UnboundedChannel is more efficient, because SingleConsumerUnboundedChannel exists and has no bounded equivalent. (We'd propbably want SingleProducerSingleConsumerBoundedChannel
... maybe in the future)
If we go with the BoundedChannel, we should expose the itemDropped delegate or have some other notification that events were dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you may be right, I ran BenchmarkDotNet and Unbounded channel Single Produce Single Consumer does seem to perform significantly better than bounded. I must have misread the initialization allocations for ones occuring later on.
finally { subscriberChannelLock.Release(); } | ||
SingleReader = true, | ||
SingleWriter = true, | ||
FullMode = BoundedChannelFullMode.DropNewest, // dropping newer events so that we don't mess with events continuity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also be DropWrite? I think the timeline will be messed up anyway.
@@ -376,17 +385,24 @@ public static void AppendTime([NotNull] StringBuilder builder, long accumulatedT | |||
|
|||
public static void AppendTime([NotNull] StringBuilder builder, TimeSpan accumulatedTimeSpan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be moved to GameProfilingSystem
or a utility class?
accumulatedTimeSpan.TotalMinutes.TryFormat(buffer, out _, "000.000"); | ||
builder.Append(buffer); | ||
builder.Append("m "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
increase buffer size and combine these to
accumulatedTimeSpan.TotalMinutes.TryFormat(buffer, out _, "000.000"); | |
builder.Append(buffer); | |
builder.Append("m "); | |
accumulatedTimeSpan.TotalMinutes.TryFormat(buffer, out _, "000.000m "); | |
builder.Append(buffer); |
or is it clearer your way?
// depends on the timer/update logic, but it does not actually need the profiling events. | ||
|
||
while (Enabled) | ||
while (await profilerChannel.WaitToReadAsync(cancellationToken)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to support cancellation here. If we're running this task and request cancellation, we've already called Profiler.Unsubscribe(), which will complete the channel, ending this loop. (And we must call Unsubscribe() to let the Profiler know that the channel should be removed). And if we don't support cancellation the call will also no longer allocate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose the only scenario where ending this task would be delayed is if we have built up a lot of events in the channel and now must wait to consume them all. In normal conditions that shouldn't happen much and the task will end very soon after completing the channel.
I'm gonna try and create some BenchmarkDotNet samples to ensure my evidence wasn't incorrectly collected. |
Benchmark descriptionI wanted to check there's a difference between bounded and unbounded channel for 1P, 1C scenario and what is the impact of including cancellation token on allocations. Code[MemoryDiagnoser]
[SimpleJob]
public class ProfilerBenchmark
{
private static ProfilingKey ProfilingKey = new ProfilingKey("Test");
private readonly Channel<ProfilingEvent> boundedChannel = Channel.CreateBounded<ProfilingEvent>(new BoundedChannelOptions(short.MaxValue)
{
SingleReader = true,
SingleWriter = true,
FullMode = BoundedChannelFullMode.DropWrite,
});
private readonly Channel<ProfilingEvent> unboundedChannel = Channel.CreateUnbounded<ProfilingEvent>(new UnboundedChannelOptions
{
SingleReader = true,
SingleWriter = true,
});
private readonly CancellationTokenSource CTS = new();
[Params(1000, 10_000)]
public int N { get; set; }
/// <summary>
/// Reads events from the stream, uses in a calculation and returns a result to ensure the compiler doesn't optimize out relevant parts.
/// </summary>
private async Task<TimeSpan> Consume(ChannelReader<ProfilingEvent> eventReader)
{
TimeSpan sum = default;
int i = 0;
// await foreach (var e in eventReader.ReadAllAsync()) // without cancellation token
await foreach (var e in eventReader.ReadAllAsync(CTS.Token))
{
sum += e.ElapsedTime;
if (++i == N) break;
}
return sum;
}
private void Produce(ChannelWriter<ProfilingEvent> writer)
{
for (int i = 0; i < N; i++)
{
writer.TryWrite(new ProfilingEvent(1, ProfilingKey, ProfilingMessageType.Begin, default, default, 1, null, default));
}
}
[Benchmark]
public async Task Bounded()
{
Task consumer = Consume(boundedChannel.Reader);
Produce(boundedChannel.Writer);
await consumer;
}
[Benchmark]
public async Task Unbounded()
{
Task consumer = Consume(unboundedChannel.Reader);
Produce(unboundedChannel.Writer);
await consumer;
}
} Machine
Without cancellation token
With cancellation token
|
Is this ready for merge ? |
No, it needs further work. Might need to convert it to draft for now |
PR Details
Description
This is a continuation of #1953 to fix performance issues with the Profiler.
As part of this work I ended up producing #2049.
From #1953
GameProfilingSystem
and make surrounding code clearer.JsonWriter
, so the internal buffer doesn't grow indefinitely and put it on the fast path by skipping validation.Additionally:
GameProfilingSystem
to extract all string formatting related logic into a private classGameProfilingSystemStrings
GameProfilingSystem
depending on the filtering modeRelated Issue
Closes #1953
Motivation and Context
The Profiler should be very efficient. After all if by measuring performance we decrease it, it's not a great system.
These changes aim to improve mainly GC pressure by minimizing allocations.
There's still further field for improvement that needs some deeper investigation - I was aiming to achieve perf parity between Fps profiler mode and CpuEvents -> it's close but there's still a bit of a difference. Furthermore there's still difference between measuring FPS without Profiling turned on and with.
Types of changes
Checklist