Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to trace C extension function in numpy #154

Open
chenming6615 opened this issue Sep 16, 2021 · 6 comments
Open

Failed to trace C extension function in numpy #154

chenming6615 opened this issue Sep 16, 2021 · 6 comments
Labels
help wanted Extra attention is needed need further investigation Don't need fix right away, but worth looking at

Comments

@chenming6615
Copy link

Hi, first thank you for making this awesome project !

I am trying to trace the execution of numpy with the following code:

import numpy as np
from viztracer import VizTracer

tracer = VizTracer()
tracer.start()

y=np.add(1, 1)  # np.add is a C extension function
print(y)

tracer.stop()
tracer.save()

The output result.json is

{"traceEvents": [{"ph": "M", "pid": 42431, "tid": 42431, "name": "process_name", "args": {"name": "MainProcess"}}, {"ph": "M", "pid": 42431, "tid": 42431, "name": "thread_name", "args": {"name": "MainThread"}}, {"pid": 42431, "tid": 42431, "ts": 38383897163.672, "dur": 8.496, "name": "builtins.print", "ph": "X", "cat": "FEE"}], "viztracer_metadata": {"version": "0.14.0"}, "file_info": {"files": {}, "functions": {}}}

The expected np.add event is missing.

How to make such C extension functions traceable?

@gaogaotiantian
Copy link
Owner

I believe this is a numpy specific issue. I did not dig too much into it but I believe numpy has some optimization there to prevent the python profile hook from executing. This is not something that can be solved from viztracer side, as sys.setprofile can't catch the numpy.add call. I do remember that some of the numpy functions are traceable.

I'm not a numpy expert, I may spend some time in this in the future, but I'm on vacation now and have limited time working on this. However, anyone is welcome to figure out why numpy won't trigger the profile hook.

@gaogaotiantian gaogaotiantian added help wanted Extra attention is needed need further investigation Don't need fix right away, but worth looking at labels Sep 19, 2021
@chenming6615
Copy link
Author

chenming6615 commented Sep 20, 2021

Hi, I investigated this issue by debugging the source code of CPython with GDB.
Here is the function in ceval.c that CPython decides a function call should whether be traced or not:

static PyObject *
trace_call_function(PyThreadState *tstate,
                    PyObject *func,
                    PyObject **args, Py_ssize_t nargs,
                    PyObject *kwnames)
{
    PyObject *x;
    if (PyCFunction_Check(func)) {
        C_TRACE(x, _PyObject_Vectorcall(func, args, nargs, kwnames));
        return x;
    }
    else if (Py_TYPE(func) == &PyMethodDescr_Type && nargs > 0) {
        /* We need to create a temporary bound method as argument
           for profiling.

           If nargs == 0, then this cannot work because we have no
           "self". In any case, the call itself would raise
           TypeError (foo needs an argument), so we just skip
           profiling. */
        PyObject *self = args[0];
        func = Py_TYPE(func)->tp_descr_get(func, self, (PyObject*)Py_TYPE(self));
        if (func == NULL) {
            return NULL;
        }
        C_TRACE(x, _PyObject_Vectorcall(func,
                                        args+1, nargs-1,
                                        kwnames));
        Py_DECREF(func);
        return x;
    }
    return _PyObject_Vectorcall(func, args, nargs | PY_VECTORCALL_ARGUMENTS_OFFSET, kwnames);
}

The direct cause is that numpy.ufunc, to which the numpy.add gets transformed, meets neither the PyCFunction_Check(func) nor the Py_TYPE(func) == &PyMethodDescr_Type condition. Thus, the C_TRACE macro is not get called.

Here is the stack trace from GDB:

#0  LONG_add_avx2 (args=0x7fffaa3d3580, dimensions=0x7fffaa3d3468, steps=0x7fffaa3d3480, __NPY_UNUSED_TAGGEDfunc=0x0) at numpy/core/src/umath/loops.c.src:637
#1  0x00007f5c025aaf79 in generic_wrapped_legacy_loop (__NPY_UNUSED_TAGGEDcontext=<optimized out>, data=<optimized out>, dimensions=<optimized out>, strides=<optimized out>, auxdata=0x7f5bffddcef0) at numpy/core/src/umath/legacy_array_method.c:87
#2  0x00007f5c025acbee in try_trivial_single_output_loop (context=context@entry=0x7fffaa3d38b0, op=op@entry=0x7fffaa3d3b50, order=order@entry=NPY_KEEPORDER, arr_prep=arr_prep@entry=0x7fffaa3d3e50, full_args=..., errormask=521, extobj=0x0) at numpy/core/src/umath/ufunc_object.c:1324
#3  0x00007f5c025b4434 in PyUFunc_GenericFunctionInternal (wheremask=<optimized out>, full_args=..., output_array_prepare=0x7fffaa3d3e50, order=NPY_KEEPORDER, casting=NPY_SAME_KIND_CASTING, extobj=0x0, op=0x7fffaa3d3b50, operation_descrs=0x7fffaa3d3d50, ufuncimpl=<optimized out>, ufunc=<optimized out>) at numpy/core/src/umath/ufunc_object.c:2623
#4  ufunc_generic_fastcall (ufunc=<optimized out>, args=<optimized out>, len_args=<optimized out>, kwnames=<optimized out>, outer=<optimized out>) at numpy/core/src/umath/ufunc_object.c:4878
#5  0x000056392d227103 in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>) at ./Include/cpython/abstract.h:127
#6  call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x56392f4d97b0) at Python/ceval.c:4963
#7  _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3469
#8  0x000056392d2ed594 in PyEval_EvalFrameEx (throwflag=0, f=0x7f5c02898440) at Python/ceval.c:741
#9  _PyEval_EvalCodeWithName (_co=_co@entry=0x7f5c027f6500, globals=globals@entry=0x7f5c028fb880, locals=locals@entry=0x7f5c028fb880, args=args@entry=0x0, argcount=argcount@entry=0, kwnames=kwnames@entry=0x0, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4298
#10 0x000056392d2ed8f7 in PyEval_EvalCodeEx (closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=0x0, argcount=0, args=0x0, locals=0x7f5c028fb880, globals=0x7f5c028fb880, _co=0x7f5c027f6500) at Python/ceval.c:4327
#11 PyEval_EvalCode (co=co@entry=0x7f5c027f6500, globals=globals@entry=0x7f5c028fb880, locals=locals@entry=0x7f5c028fb880) at Python/ceval.c:718
#12 0x000056392d32c3c3 in run_eval_code_obj (locals=0x7f5c028fb880, globals=0x7f5c028fb880, co=0x7f5c027f6500) at Python/pythonrun.c:1125
#13 run_mod (mod=mod@entry=0x56392f5111d8, filename=filename@entry=0x7f5c02851db0, globals=globals@entry=0x7f5c028fb880, locals=locals@entry=0x7f5c028fb880, flags=flags@entry=0x7fffaa3d55d8, arena=arena@entry=0x7f5c0298a7b0) at Python/pythonrun.c:1147
#14 0x000056392d32e365 in PyRun_FileExFlags (fp=0x56392f4d8010, filename_str=<optimized out>, start=<optimized out>, globals=0x7f5c028fb880, locals=0x7f5c028fb880, closeit=1, flags=0x7fffaa3d55d8) at Python/pythonrun.c:1063
#15 0x000056392d32e4d1 in PyRun_SimpleFileExFlags (fp=fp@entry=0x56392f4d8010, filename=<optimized out>, closeit=closeit@entry=1, flags=flags@entry=0x7fffaa3d55d8) at Python/pythonrun.c:428
#16 0x000056392d32ea2b in PyRun_AnyFileExFlags (fp=fp@entry=0x56392f4d8010, filename=<optimized out>, closeit=closeit@entry=1, flags=flags@entry=0x7fffaa3d55d8) at Python/pythonrun.c:86
#17 0x000056392d22bacf in pymain_run_file (cf=0x7fffaa3d55d8, config=0x56392f4d8b50) at Modules/main.c:381
#18 pymain_run_python (exitcode=exitcode@entry=0x7fffaa3d5710) at Modules/main.c:606
#19 0x000056392d22c13f in Py_RunMain () at Modules/main.c:685
#20 pymain_main (args=0x7fffaa3d56d0) at Modules/main.c:715
#21 Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:739
#22 0x00007f5c037920b3 in __libc_start_main (main=0x56392d21fd20 <main>, argc=2, argv=0x7fffaa3d5838, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffaa3d5828) at ../csu/libc-start.c:308

What I haven't figured out yet is that:

  1. How does numpy.add get transformed into numpy.ufunc?
  2. Why does numpy.ufunc meet neither the PyCFunction_Check(func) nor the Py_TYPE(func) == &PyMethodDescr_Type condition?

Also, I tried to trace pytorch and get the same kind of issue.

I think this is important since THE most important usage of python nowadays, IMHO, is deep learning. A tool that can trace the implementations of those complex deep learning frameworks is utterly desired.

Hope we can find a solution for this.

Thanks!

@gaogaotiantian
Copy link
Owner

I agree that this is a very important usage. numpy probably has some optimization that made the add function a pure C function, instead of a PyCFunction.

I'm not an expert on CPython implementation, I can understand your investigation but I do not know the root cause of it either.

However, I do know pretty well about viztracer's mechanism. VizTracer relies on the CPython hook function, to be more specific, PyEval_SetProfile. If the function to be traced does not trigger this callback function, there's really not much viztracer can do about it. This would be a numpy design choice, or a CPython issue.

From another point of view, a ML/DL engineer, even with the fact that they are working heavily with the ML framework, may not need to trace every single fundamental call, especially for profiling, maybe for debugging too. They might get enough information from the traceable call stack. For example, I know sklearn is using viztracer for profiling and they get something pretty useful from it.

So for this issue, I have two proposals:

  1. Maybe digging into the root cause of not able to trace numpy functions, and see how could numpy or CPython could solve it(and are they willing to do so)
  2. Try viztracer in a real ML problem and see if the basic functions are critical to trace to get enough useful information.

@ghost
Copy link

ghost commented Oct 6, 2023

It seems C extensions won't be supported in VizTracer and it's probably out of scope due to the way it works.
Any news?

Are there alternatives?
A possible workaround is perf support in Python 3.12
https://docs.python.org/3.12/howto/perf_profiling.html
Or py-spy
https://github.com/benfred/py-spy

@gaogaotiantian
Copy link
Owner

Well, C extensions are supported - C functions are not. Basically if you call into a C extension from Python code, it will be recorded. VizTracer is not able to log all the C calls. In short, anything logged by sys.setprofile will be logged by VizTracer.

py-spy can log C-level functions, yes. perf is designed for C-level functions so it will work too (the Python-level support is not as good).

However, both py-spy and perf are sampling profilers, not tracers. So if you need profiling, they are great tools, but they can do what VizTracer does.

@ghost
Copy link

ghost commented Oct 6, 2023

I appreciate your explanation and your support for this project.

Off topic:
I found https://github.com/namhyung/uftrace which looks like it can trace C-level functions and it supports the chrome output format similar to VizTracer's format. Unfortunately it needs a debug build of CPython https://github.com/namhyung/uftrace/wiki/uftrace-for-cpython

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed need further investigation Don't need fix right away, but worth looking at
Projects
None yet
Development

No branches or pull requests

2 participants