Remotery gets confused by fine-grain multi-threaded apps. #126

stolk · 2017-10-27T21:24:37Z

I have pitted Remotery and Minitrace against each other.
On single-threaded apps, the results are the same.

For a multi-threaded run, however, I see that Remotery's measurements deviated from Minitrace's.
It looks like Remotery can miss events.

Race condition between worker threads trying to record a Remotery event?

Screenshots of the Remotery/Minifig runs:
http://thelittleengineerthatcould.blogspot.ca/2017/10/pitting-profilers-against-each-other.html

dwilliamson · 2017-10-28T08:44:27Z

Have you tried increasing the size of the message queue?

dwilliamson · 2017-10-28T10:44:56Z

Actually, it's worth mentioning that Minitrace uses a mutex for pushing events which in cases of high contention won't give you accurate traces. There's every chance that threads queued for access to the mutex will give up their time-slice to the OS.

stolk · 2017-10-28T19:53:14Z

Thanks for the suggestion,

I tried doubling g_Settings.messageQueueSizeInBytes which did not help.

Then I tried doublng g_Settings.maxNbMessagesPerUpdate to 20, which helped a bit.
Still missing measurement, but it seems to miss fewer now?

Anything else I should try?

stolk · 2017-10-28T20:01:51Z

Quadrupling queuesize, quadrupling message-per-update and halving sleep-between-updates, the situation gets better again, but still data is missing.

I can see this because I know there should be a 100 render_scanlines calls for each render_image, and the profile shows less than that, with gaps.

How does Remotery handle the case of multiple threads trying to record samples at the same time?

dwilliamson · 2017-10-28T20:21:04Z

It should handle contention fine. I think what's happening is a combination of two things because you're sending data so fast:

The message queue to the main thread is filling up fast and discarding samples.
The code to send stuff over the network may be failing.

Looking at the times you have a good stress test for Remotery, I only wish I could get access to your code so that I can reproduce the scenario myself.

dwilliamson · 2017-10-28T20:26:18Z

Bear in mind the only way to not lose data unconditionally is to block the thread issuing the sample and I don't want Remotery to do that, ever. So it'll be a case of finding the weak point and adding more memory or processor time to get those events out fast.

stolk · 2017-10-28T23:03:13Z

I think it's possible to avoid losing data without blocking threads.

If every thread writes to its own queue, instead of a shared queue, there would be no race conditions, and you would not need a mutex either.

A data aggregator could then be the only reader of the queues.
A circular buffer per thread, where the thread advances tail, the server advances the head.
So there is no race to writing head or tail either.

I think I'll attempt a proof of concept implementation of this.

dwilliamson · 2017-10-28T23:11:01Z

It still reduces to the same problem: if you are writing to the queue faster than data is being pulled from it, you will have to block. The only solution being to allocate more memory for the queue to decrease the chance of that happening.

We still don't know the source of the problem. It could be that there's a failure point in sending the data across the network that isn't being reported.

dwilliamson · 2017-10-28T23:15:40Z

So what I'm saying is that to be sure, a breakpoint here would show stuff being discarded https://github.com/Celtoys/Remotery/blob/master/lib/Remotery.c#L4039

dwilliamson · 2017-10-28T23:17:04Z

I think I might add a global Error Object that increments atomics based on how many times an error occurred. Using an error queue to report errors in an error queue doesn't sound dependable :)

dwilliamson · 2017-10-28T23:22:42Z

Thinking about it more, this is a real problem for reporting events at a frequency greater than the server code is capable of sending them over the network. It doesn't matter how much memory you allocate, eventually it will catch up to the reader.

stolk · 2017-10-28T23:47:42Z

Thanks.
It indeed shows that rmtMessageQueue_AllocMessage() fails, shortly after start.

Sure, you can always overflow the queue.
But I think it's possible to handle this better.
Once this happens, I think you should clear the queue altogether, and drop all events currently in the queue, instead of just dropping the new ones.

I would rather have large gaps in the trace where the server choked on traffic, than have events randomly disappear without warning.

For instance in game-dev, you simply scroll to that part where there is a full frame's worth of data.
It's rare to be interested in a timespan longer than a display frame, typically for game dev.

So maybe clear the queue, and then write a single event that signals the overflow of the queue, so you are alerted to it in the visualizer?

dwilliamson · 2017-10-30T09:29:50Z

The viewer definitely needs a visualisation that tells you when data has been dropped but I'm not happy with dropping entire frames because of what in most cases is an odd occurrence. I think what I'll do is a collection of things:

Add the option to block/spin without mutex so no data is lost.
Grow the initial message queue size to 0.5MB (64k is slightly optimistic).
Look into a spin-loop on the receiver thread that only goes to sleep when necessary, to increase throughput.
Fast-forward the emit to log file development so that all data can be quickly captured and replayed.

stolk · 2017-10-30T16:28:58Z

Sounds good!

I did end up writing my own profiler, btw.
It is not live view like Remotery, nor multi platform.
But does have a neat feature that shows you how how often (and how long) a thread got pre-empted by scheduler, or went to sleep voluntarily.
In case you're interested: https://github.com/stolk/ThreadTracer

dwilliamson mentioned this issue Aug 21, 2018

Threads not showing up #141

Closed

dwilliamson mentioned this issue Sep 9, 2018

Bandwidth issues #144

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remotery gets confused by fine-grain multi-threaded apps. #126

Remotery gets confused by fine-grain multi-threaded apps. #126

stolk commented Oct 27, 2017 •

edited

dwilliamson commented Oct 28, 2017

dwilliamson commented Oct 28, 2017

stolk commented Oct 28, 2017

stolk commented Oct 28, 2017

dwilliamson commented Oct 28, 2017

dwilliamson commented Oct 28, 2017

stolk commented Oct 28, 2017

dwilliamson commented Oct 28, 2017

dwilliamson commented Oct 28, 2017

dwilliamson commented Oct 28, 2017

dwilliamson commented Oct 28, 2017

stolk commented Oct 28, 2017

dwilliamson commented Oct 30, 2017

stolk commented Oct 30, 2017

Remotery gets confused by fine-grain multi-threaded apps. #126

Remotery gets confused by fine-grain multi-threaded apps. #126

Comments

stolk commented Oct 27, 2017 • edited

dwilliamson commented Oct 28, 2017

dwilliamson commented Oct 28, 2017

stolk commented Oct 28, 2017

stolk commented Oct 28, 2017

dwilliamson commented Oct 28, 2017

dwilliamson commented Oct 28, 2017

stolk commented Oct 28, 2017

dwilliamson commented Oct 28, 2017

dwilliamson commented Oct 28, 2017

dwilliamson commented Oct 28, 2017

dwilliamson commented Oct 28, 2017

stolk commented Oct 28, 2017

dwilliamson commented Oct 30, 2017

stolk commented Oct 30, 2017

stolk commented Oct 27, 2017 •

edited