Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remotery gets confused by fine-grain multi-threaded apps. #126

Open
stolk opened this issue Oct 27, 2017 · 14 comments
Open

Remotery gets confused by fine-grain multi-threaded apps. #126

stolk opened this issue Oct 27, 2017 · 14 comments

Comments

@stolk
Copy link

stolk commented Oct 27, 2017

I have pitted Remotery and Minitrace against each other.
On single-threaded apps, the results are the same.

For a multi-threaded run, however, I see that Remotery's measurements deviated from Minitrace's.
It looks like Remotery can miss events.

Race condition between worker threads trying to record a Remotery event?

Screenshots of the Remotery/Minifig runs:
http://thelittleengineerthatcould.blogspot.ca/2017/10/pitting-profilers-against-each-other.html

@dwilliamson
Copy link
Collaborator

Have you tried increasing the size of the message queue?

@dwilliamson
Copy link
Collaborator

Actually, it's worth mentioning that Minitrace uses a mutex for pushing events which in cases of high contention won't give you accurate traces. There's every chance that threads queued for access to the mutex will give up their time-slice to the OS.

@stolk
Copy link
Author

stolk commented Oct 28, 2017

Thanks for the suggestion,

I tried doubling g_Settings.messageQueueSizeInBytes which did not help.

Then I tried doublng g_Settings.maxNbMessagesPerUpdate to 20, which helped a bit.
Still missing measurement, but it seems to miss fewer now?

Anything else I should try?

@stolk
Copy link
Author

stolk commented Oct 28, 2017

Quadrupling queuesize, quadrupling message-per-update and halving sleep-between-updates, the situation gets better again, but still data is missing.

I can see this because I know there should be a 100 render_scanlines calls for each render_image, and the profile shows less than that, with gaps.

How does Remotery handle the case of multiple threads trying to record samples at the same time?

@dwilliamson
Copy link
Collaborator

It should handle contention fine. I think what's happening is a combination of two things because you're sending data so fast:

  • The message queue to the main thread is filling up fast and discarding samples.
  • The code to send stuff over the network may be failing.

Looking at the times you have a good stress test for Remotery, I only wish I could get access to your code so that I can reproduce the scenario myself.

@dwilliamson
Copy link
Collaborator

Bear in mind the only way to not lose data unconditionally is to block the thread issuing the sample and I don't want Remotery to do that, ever. So it'll be a case of finding the weak point and adding more memory or processor time to get those events out fast.

@stolk
Copy link
Author

stolk commented Oct 28, 2017

I think it's possible to avoid losing data without blocking threads.

If every thread writes to its own queue, instead of a shared queue, there would be no race conditions, and you would not need a mutex either.

A data aggregator could then be the only reader of the queues.
A circular buffer per thread, where the thread advances tail, the server advances the head.
So there is no race to writing head or tail either.

I think I'll attempt a proof of concept implementation of this.

@dwilliamson
Copy link
Collaborator

It still reduces to the same problem: if you are writing to the queue faster than data is being pulled from it, you will have to block. The only solution being to allocate more memory for the queue to decrease the chance of that happening.

We still don't know the source of the problem. It could be that there's a failure point in sending the data across the network that isn't being reported.

@dwilliamson
Copy link
Collaborator

So what I'm saying is that to be sure, a breakpoint here would show stuff being discarded https://github.com/Celtoys/Remotery/blob/master/lib/Remotery.c#L4039

@dwilliamson
Copy link
Collaborator

I think I might add a global Error Object that increments atomics based on how many times an error occurred. Using an error queue to report errors in an error queue doesn't sound dependable :)

@dwilliamson
Copy link
Collaborator

Thinking about it more, this is a real problem for reporting events at a frequency greater than the server code is capable of sending them over the network. It doesn't matter how much memory you allocate, eventually it will catch up to the reader.

@stolk
Copy link
Author

stolk commented Oct 28, 2017

Thanks.
It indeed shows that rmtMessageQueue_AllocMessage() fails, shortly after start.

Sure, you can always overflow the queue.
But I think it's possible to handle this better.
Once this happens, I think you should clear the queue altogether, and drop all events currently in the queue, instead of just dropping the new ones.

I would rather have large gaps in the trace where the server choked on traffic, than have events randomly disappear without warning.

For instance in game-dev, you simply scroll to that part where there is a full frame's worth of data.
It's rare to be interested in a timespan longer than a display frame, typically for game dev.

So maybe clear the queue, and then write a single event that signals the overflow of the queue, so you are alerted to it in the visualizer?

@dwilliamson
Copy link
Collaborator

The viewer definitely needs a visualisation that tells you when data has been dropped but I'm not happy with dropping entire frames because of what in most cases is an odd occurrence. I think what I'll do is a collection of things:

  • Add the option to block/spin without mutex so no data is lost.
  • Grow the initial message queue size to 0.5MB (64k is slightly optimistic).
  • Look into a spin-loop on the receiver thread that only goes to sleep when necessary, to increase throughput.
  • Fast-forward the emit to log file development so that all data can be quickly captured and replayed.

@stolk
Copy link
Author

stolk commented Oct 30, 2017

Sounds good!

I did end up writing my own profiler, btw.
It is not live view like Remotery, nor multi platform.
But does have a neat feature that shows you how how often (and how long) a thread got pre-empted by scheduler, or went to sleep voluntarily.
In case you're interested: https://github.com/stolk/ThreadTracer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants