-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
关于分类模型训练测试, 每次比PyTorch慢几秒的原因&可复现代码 #120
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
前言
py-spy 分析
可稳定复现代码
最近计划
前言
在研究 定位 PyTorch 中 Python API 对应的 C++ 代码 https://github.com/Oneflow-Inc/OneTeam/issues/147 时候
试了下 pytorch官网推荐的一个性能定位工具 py-spy
定位了到pr: #111 在分类模型训练测试, 每次比PyTorch慢几秒的在 tloss =
(tloss * i + loss.item()) / (i + 1) # update mean losses
这一行Profiling with `py-spy`
Evaluating the performance impact of code changes in PyTorch can be complicated,
particularly if code changes happen in compiled code. One simple way to profile
both Python and C++ code in PyTorch is to use
py-spy
, a sampling profiler for Pythonthat has the ability to profile native code and Python code in the same session.
py-spy
can be installed viapip
:To use
py-spy
, first write a Python test script that exercises thefunctionality you would like to profile. For example, this script profiles
torch.add
:Since the
torch.add
operation happens in microseconds, we repeat it a largenumber of times to get good statistics. The most straightforward way to use
py-spy
with such a script is to generate a flamegraph:
This will output a file named
profile.svg
containing a flame graph you canview in a web browser or SVG viewer. Individual stack frame entries in the graph
can be selected interactively with your mouse to zoom in on a particular part of
the program execution timeline. The
--native
command-line option tellspy-spy
to record stack frame entries for PyTorch C++ code. To get line numbersfor C++ code it may be necessary to compile PyTorch in debug mode by prepending
your
setup.py develop
call to compile PyTorch withDEBUG=1
. Depending onyour operating system it may also be necessary to run
py-spy
with rootprivileges.
py-spy
can also work in anhtop
-like "live profiling" mode and can betweaked to adjust the stack sampling rate, see the
py-spy
readme for moredetails.
原来的分类训练测试结果
原来的分类训练测试方法 #111 (comment)
py-spy 分析
y轴表示函数的调用栈,x轴表示函数的执行时间,那么函数在x轴越宽表示执行时间越长,也说明是性能的瓶颈点。
从下面两张图可以发现 tloss =
(tloss * i + loss.item()) / (i + 1) # update mean losses
这一行对性能是有一定影响的。pytorch 后端 tloss =
(tloss * i + loss.item()) / (i + 1) # update mean losses
这一行得用放大镜看oneflow后端 tloss =
(tloss * i + loss.item()) / (i + 1) # update mean losses
这一行比较明显可稳定复现代码
可稳定复现代码- 使用机器 oneflow27-root
- 2023-03-09 编译的oneflow 版本
- flow.version='0.9.1+cu117.git.a4b7145d01' 耗时0.7273483276367188
- torch.version='1.13.0+cu117' 耗时0.11882472038269043
下面代码定义了一个计时的 Profile类,和两个test_torch, test_oneflow 函数
输出
最近计划
The text was updated successfully, but these errors were encountered: