-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add/Implement NUMA awareness #1093
Comments
Seems fine as it is. It isn't returning 0, so nothing breaks. |
On my configuration hes return 36, not 72. |
I think I get what's going on, and why you're expecting a different value. You have a NUMA-enabled system with multiple CPUs, which you incorrectly assume to act as free additional performance. However in order to actually take advantage of this NUMA design (even the fake NUMA design on AMD Ryzen), we'd need to be fully aware of it. Why? Because the OS scheduler will pin our threads and resources to the (fake) NUMA node that is closest to the primary resource. In StreamFX case, this is the CPU closest to the GPU. The other CPU will be deprioritized by the OS scheduler, so more threads won't run faster automatically. If anything, any memory allocated on the wrong CPU will end up significantly slower than normal. The value you see reported here, 36, is the correct value for NUMA unaware software. Since there is no reason to support NUMA architectures, there's no reason to change this. If you believe that an encoder would work better with more threads, or you believe that the micro-tasks really require more cores (on average there's 1-2 tasks per minute), then you're free to implement the required NUMA awareness code. |
Also you can use the Custom Settings field to override what the UI offers for the FFmpeg encoders. Pretty sure there's no encoder out there that can effectively make use of that many threads in real time anyway. |
@Xaymar, |
You do need to rewrite a lot for NUMA awareness. What is already being detected is correct for NUMA unaware applications. You can manually override the threads value using Custom Settings as already stated. |
I understand, u don't know ffmpeg itself can use all logical cores, then I need to do PR there to fix it? |
You would need to make FFmpeg and StreamFX and OBS Studio fully NUMA aware on all platforms. This means that you need to replace new/delete, make_shared/make_unique, etc. with NUMA aware allocations close to where the primary resource related to the task is. Anything else will reduce performance. Again, you can simply override the number of threads to the number of threads in your system. That's what Custom Settings is for. Note that Windows' incorrectly reports Core/Thread pairs as two cores, and thus fully relies on the CPU to report full utilization. A 36 Core system with 72 Threads at 50% is actually 100% utilized on Windows, but it can still do some additional work using unused or stalled pipelines. Threads are not more cores, just subdivisions of the Core that happen while the primary work on the Core is stalled. AFAIK, only Intel properly reports Thread usage. AMD incorrectly claims Threads are at 0% usage if nothing is running on them or can run on them. |
@Xaymar, |
Indeed. And for OBS Studio, and by extension StreamFX, to run better, you would need to allocate memory and threads close to the resource in question. This would mean allocating GPU-reliant data near the NUMA node that has the GPU, network-reliant data near the NUMA node that has the NIC, and disk related data near the NUMA node that has the disk. All for a tiny gain in performance that 99.9% of users will never need, and the remaining 0.1% could simply use Custom Settings and type in Not to mention that it would actually slow down OBS Studio and StreamFX for users that do not have a NUMA system, since additional care has to be taken on every single allocation. With the amount of data StreamFX throws around per frame, and the amount of data OBS Studio throws around per frame, you'd end up slower than if you just used the Custom Settings as described above. |
Current and Expected Behavior
obs-StreamFX/source/util/util-threadpool.hpp
Line 119 in bbcce86
obs-StreamFX/components/ffmpeg/source/encoders/encoder-ffmpeg.cpp
Line 207 in bbcce86
obs-StreamFX/components/ffmpeg/source/encoders/encoder-ffmpeg.cpp
Line 1085 in bbcce86
Steps to Reproduce the Problem
For more detailed explanation issue and fixes bug using my PR changes, you can read further at link here: notepad-plus-plus/notepad-plus-plus#14627
Log files & Crash Dumps
No response
Any additional Information we need to know?
No response
The text was updated successfully, but these errors were encountered: