Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to mark blocking/unblocking points around direct system calls #44

Open
toddlipcon opened this issue Feb 26, 2016 · 10 comments · May be fixed by #195
Open

Allow users to mark blocking/unblocking points around direct system calls #44

toddlipcon opened this issue Feb 26, 2016 · 10 comments · May be fixed by #195
Labels
feature This issue is a feature request

Comments

@toddlipcon
Copy link

toddlipcon commented Feb 26, 2016

My application uses futex to implement a spinlock. This is fairly common in high performance server code (eg gperftools uses futex for spinlocks, as do libraries like facebook's folly).

For better accuracy, libcoz should intercept the futex syscall and treat it similar to how condition variables are treated.

@toddlipcon
Copy link
Author

The difficulty here is that system calls are often called by inline assembly 'int 0x80' so the LD_PRELOAD interception isn't sufficient. Probably need to use ptrace, or otherwise require the program being profiled to annotate these manually-implemented synchronization primitives using something similar to how the COZ_PROGRESS macro calls into the libcoz runtime.

@toddlipcon
Copy link
Author

I put some hacky code here: https://gist.github.com/toddlipcon/761fa7f8bd9e91f8a8dd
though not getting very good results on my actual application. Hints would be great.

@ccurtsinger
Copy link
Member

I'm not excited about paying ptrace's ~8% overhead all the time, since this would distort the program runtime quite a bit more than Coz already does. Extra instrumentation isn't ideal in terms of user effort, but it should work. Could you post your profile.coz file somewhere so I can take a look at the results?

A long term solution may be to use perf's trace events, which can count context switches and thread blocking counts, regardless of the cause of the blocking/unblocking event.

@emeryberger
Copy link
Member

Another option might be for Coz to expose a mechanism that would let programmers indicate that certain functions correspond to pthread_mutex_lock and pthread_mutex_unlock and condvar friends.

@ccurtsinger
Copy link
Member

@emeryberger: Yeah, that's roughly what the patch above does.

@toddlipcon: your implementation seems like it should be okay. Do you have a simple example where coz seems to do the wrong thing with your extra macros?

@toddlipcon
Copy link
Author

I got a chance to look at this again today (thanks for pinging this issue).

Looking at the profile results, it looks like coz is just picking the same experiment over and over again to the point that it basically is never exploring any interesting parts of the code.

In particular, I'm profiling a benchmark program that looks like the following code:

SetUpRPCServer();
for (int i = 0; i < 8; i++) {
  threads.emplace_back(([]{ 
    while (true) {
      MakeRPCCallToLocalhost();
      COZ_PROGRESS;
    }

The implementation of 'MakeRPCCall' essentially delegates work to another thread (a libev event loop) and then blocks on a mutex/condvar until the call gets back.

The issue seems to be that, in the steady state, most of my threads are blocked in this stack trace waiting for a response. So the task-clock based profile collection means that it's extremely likely to pick this line of code for the experiment:

todd@todd-ThinkPad-T540p:~/git/coz$ grep experiment ../kudu/profile.coz  | awk '{print $2}' | sort | uniq -c | sort -nk1 | tail
      1 selected=/home/todd/git/kudu/thirdparty/gflags-2.1.2/include/gflags/gflags.h:154
      1 selected=/home/todd/git/kudu/thirdparty/installed-deps/include/google/protobuf/io/coded_stream.h:1091
      1 selected=/home/todd/git/kudu/thirdparty/libev-4.20/ev.c:3521
      2 selected=/home/todd/git/kudu/src/kudu/rpc/rpc-bench.cc:75
      2 selected=/home/todd/git/kudu/src/kudu/rpc/rpc-bench.cc:76
      2 selected=/home/todd/git/kudu/thirdparty/glog-0.3.4/src/logging.cc:2034
      3 selected=/home/todd/git/kudu/build/release/src/kudu/rpc/rtest.pb.cc:721
      4 selected=/home/todd/git/kudu/src/kudu/rpc/rpc-bench.cc:81
     33 selected=/home/todd/git/kudu/build/release/src/kudu/rpc/rtest.proxy.cc:23
   1454 selected=/home/todd/git/kudu/build/release/src/kudu/rpc/rtest.proxy.cc:84

(rtest.proxy.cc:84 is the last line within my source code for sending an RPC).

It seems almost as if the perf events are only getting collected from the "client" thread and not the "server" threads which are in the same process. Any ideas?

@toddlipcon
Copy link
Author

I just noticed that if I don't pass '-s %%/src/kudu/%%' I end up getting a lot better spreading to experiments on other files. Maybe something is wrong with the way that experiment lines are getting filtered.

@ccurtsinger
Copy link
Member

When you omit the -s flag, do you find lots of samples in source files that don't match that pattern?

If coz gets a sample that's not in the given source scope, it walks back up the stack to find the last callsite that is in scope (if any). I'm guessing your application runtime is dominated by computation that is invoked (indirectly) from the callsite where your hotspot is.

@ccurtsinger ccurtsinger added the feature This issue is a feature request label Aug 10, 2016
@ccurtsinger ccurtsinger added this to the v0.5 milestone Aug 10, 2016
@ccurtsinger
Copy link
Member

The fix for #57 should resolve your second issue, once it's done.

Could you submit the change in your gist as a pull request?

@ccurtsinger ccurtsinger changed the title libcoz should intercept futex syscalls Allow users to mark blocking/unblocking points around direct system calls Aug 10, 2016
@ccurtsinger ccurtsinger removed this from the v0.5 milestone Dec 9, 2016
@vlovich
Copy link

vlovich commented Mar 11, 2022

Are the pre block/post block annotations needed for epoll too? Or is epoll understood properly and this only applies to condvar and futexes?

I have a complicated multiprocess system (processes spawned via forks of the main process which is always idle and uninteresting) and trying to see if coz will be a good fit to figure out why some complicated RPC between some components is slow.

vlovich added a commit to vlovich/coz that referenced this issue Mar 11, 2022
@vlovich vlovich linked a pull request Mar 11, 2022 that will close this issue
vlovich added a commit to vlovich/coz that referenced this issue Oct 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature This issue is a feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants