Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use only explicit NVTX3 V1 API in CUB #1751

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

bernhardmgruber
Copy link
Contributor

This PR lets the CUB headers detect the available NVTX3 API C++ wrapper version and then only provide NVTX range functionality if V1 is detected. Furthermore, CUB programs against the explicit API version, which is available independent of whether the user uses NVTX3 in explicit or implicit API flavor.

This PR is an evolution of: #1688

Fixes: #1750

Copy link
Contributor

🟨 CI Results [ Failed: 55 | Passed: 143 | Total: 198 ]
  • 🟩 Project thrust [ Failed: 0 | Passed: 99 | Total: 99 ]

    🟩 cpu
      🟩 amd64 (0% Fail)              Failed:  0  -- Passed: 91  -- Total: 91 
      🟩 arm64 (0% Fail)              Failed:  0  -- Passed:  8  -- Total:  8 
    🟩 ctk
      🟩 11.1 (0% Fail)               Failed:  0  -- Passed: 15  -- Total: 15 
      🟩 11.8 (0% Fail)               Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 12.4 (0% Fail)               Failed:  0  -- Passed: 81  -- Total: 81 
    🟩 cudacxx_full
      🟩 clang-cuda16 (0% Fail)       Failed:  0  -- Passed:  2  -- Total:  2 
      🟩 nvcc11.1 (0% Fail)           Failed:  0  -- Passed: 15  -- Total: 15 
      🟩 nvcc11.8 (0% Fail)           Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 nvcc12.4 (0% Fail)           Failed:  0  -- Passed: 79  -- Total: 79 
    🟩 cudacxx_name
      🟩 clang-cuda (0% Fail)         Failed:  0  -- Passed:  2  -- Total:  2 
      🟩 nvcc (0% Fail)               Failed:  0  -- Passed: 97  -- Total: 97 
    🟩 cxx_full
      🟩 clang9 (0% Fail)             Failed:  0  -- Passed:  6  -- Total:  6 
      🟩 clang10 (0% Fail)            Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 clang11 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang12 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang13 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang14 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang15 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang16 (0% Fail)            Failed:  0  -- Passed: 14  -- Total: 14 
      🟩 gcc6 (0% Fail)               Failed:  0  -- Passed:  2  -- Total:  2 
      🟩 gcc7 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
      🟩 gcc8 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
      🟩 gcc9 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
      🟩 gcc10 (0% Fail)              Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 gcc11 (0% Fail)              Failed:  0  -- Passed:  7  -- Total:  7 
      🟩 gcc12 (0% Fail)              Failed:  0  -- Passed: 16  -- Total: 16 
      🟩 Intel2023.2.0 (0% Fail)      Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 MSVC14.16 (0% Fail)          Failed:  0  -- Passed:  1  -- Total:  1 
      🟩 MSVC14.29 (0% Fail)          Failed:  0  -- Passed:  2  -- Total:  2 
      🟩 MSVC14.39 (0% Fail)          Failed:  0  -- Passed:  3  -- Total:  3 
    🟩 cxx_name
      🟩 clang (0% Fail)              Failed:  0  -- Passed: 43  -- Total: 43 
      🟩 gcc (0% Fail)                Failed:  0  -- Passed: 47  -- Total: 47 
      🟩 Intel (0% Fail)              Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 MSVC (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
    🟩 gpu
      🟩 v100 (0% Fail)               Failed:  0  -- Passed: 99  -- Total: 99 
    🟩 jobs
      🟩 build (0% Fail)              Failed:  0  -- Passed: 91  -- Total: 91 
      🟩 test (0% Fail)               Failed:  0  -- Passed:  8  -- Total:  8 
    🟩 os
      🟩 ubuntu18.04 (0% Fail)        Failed:  0  -- Passed: 14  -- Total: 14 
      🟩 ubuntu20.04 (0% Fail)        Failed:  0  -- Passed: 35  -- Total: 35 
      🟩 ubuntu22.04 (0% Fail)        Failed:  0  -- Passed: 44  -- Total: 44 
      🟩 windows2022 (0% Fail)        Failed:  0  -- Passed:  6  -- Total:  6 
    🟩 sm
      🟩 60;70;80;90 (0% Fail)        Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 90a (0% Fail)                Failed:  0  -- Passed:  4  -- Total:  4 
    🟩 std
      🟩 11 (0% Fail)                 Failed:  0  -- Passed: 26  -- Total: 26 
      🟩 14 (0% Fail)                 Failed:  0  -- Passed: 29  -- Total: 29 
      🟩 17 (0% Fail)                 Failed:  0  -- Passed: 28  -- Total: 28 
      🟩 20 (0% Fail)                 Failed:  0  -- Passed: 16  -- Total: 16 
    
  • 🟨 Project cub [ Failed: 55 | Passed: 44 | Total: 99 ]

    🔍 cudacxx_name: nvcc 🔍
      🟩 clang-cuda (0% Fail)         Failed:  0  -- Passed:  2  -- Total:  2 
      🔍 nvcc (56% Fail)              Failed: 55  -- Passed: 42  -- Total: 97 
    🟨 cpu
      🟨 amd64 (56% Fail)             Failed: 51  -- Passed: 40  -- Total: 91 
      🟨 arm64 (50% Fail)             Failed:  4  -- Passed:  4  -- Total:  8 
    🟨 ctk
      🟨 11.1 (73% Fail)              Failed: 11  -- Passed:  4  -- Total: 15 
      🟨 11.8 (66% Fail)              Failed:  2  -- Passed:  1  -- Total:  3 
      🟨 12.4 (51% Fail)              Failed: 42  -- Passed: 39  -- Total: 81 
    🟨 cudacxx_full
      🟩 clang-cuda16 (0% Fail)       Failed:  0  -- Passed:  2  -- Total:  2 
      🟨 nvcc11.1 (73% Fail)          Failed: 11  -- Passed:  4  -- Total: 15 
      🟨 nvcc11.8 (66% Fail)          Failed:  2  -- Passed:  1  -- Total:  3 
      🟨 nvcc12.4 (53% Fail)          Failed: 42  -- Passed: 37  -- Total: 79 
    🟨 cxx_full
      🟨 clang9 (66% Fail)            Failed:  4  -- Passed:  2  -- Total:  6 
      🟨 clang10 (66% Fail)           Failed:  2  -- Passed:  1  -- Total:  3 
      🟨 clang11 (50% Fail)           Failed:  2  -- Passed:  2  -- Total:  4 
      🟨 clang12 (50% Fail)           Failed:  2  -- Passed:  2  -- Total:  4 
      🟨 clang13 (50% Fail)           Failed:  2  -- Passed:  2  -- Total:  4 
      🟨 clang14 (50% Fail)           Failed:  2  -- Passed:  2  -- Total:  4 
      🟨 clang15 (50% Fail)           Failed:  2  -- Passed:  2  -- Total:  4 
      🟨 clang16 (42% Fail)           Failed:  6  -- Passed:  8  -- Total: 14 
      🟥 gcc6 (100% Fail)             Failed:  2  -- Passed:  0  -- Total:  2 
      🟨 gcc7 (66% Fail)              Failed:  4  -- Passed:  2  -- Total:  6 
      🟨 gcc8 (66% Fail)              Failed:  4  -- Passed:  2  -- Total:  6 
      🟨 gcc9 (66% Fail)              Failed:  4  -- Passed:  2  -- Total:  6 
      🟨 gcc10 (50% Fail)             Failed:  2  -- Passed:  2  -- Total:  4 
      🟨 gcc11 (57% Fail)             Failed:  4  -- Passed:  3  -- Total:  7 
      🟨 gcc12 (50% Fail)             Failed:  8  -- Passed:  8  -- Total: 16 
      🟨 Intel2023.2.0 (66% Fail)     Failed:  2  -- Passed:  1  -- Total:  3 
      🟥 MSVC14.16 (100% Fail)        Failed:  1  -- Passed:  0  -- Total:  1 
      🟨 MSVC14.29 (50% Fail)         Failed:  1  -- Passed:  1  -- Total:  2 
      🟨 MSVC14.39 (33% Fail)         Failed:  1  -- Passed:  2  -- Total:  3 
    🟨 cxx_name
      🟨 clang (51% Fail)             Failed: 22  -- Passed: 21  -- Total: 43 
      🟨 gcc (59% Fail)               Failed: 28  -- Passed: 19  -- Total: 47 
      🟨 Intel (66% Fail)             Failed:  2  -- Passed:  1  -- Total:  3 
      🟨 MSVC (50% Fail)              Failed:  3  -- Passed:  3  -- Total:  6 
    🟨 jobs
      🟨 build (56% Fail)             Failed: 51  -- Passed: 40  -- Total: 91 
      🟨 test (50% Fail)              Failed:  4  -- Passed:  4  -- Total:  8 
    🟨 os
      🟨 ubuntu18.04 (71% Fail)       Failed: 10  -- Passed:  4  -- Total: 14 
      🟨 ubuntu20.04 (57% Fail)       Failed: 20  -- Passed: 15  -- Total: 35 
      🟨 ubuntu22.04 (50% Fail)       Failed: 22  -- Passed: 22  -- Total: 44 
      🟨 windows2022 (50% Fail)       Failed:  3  -- Passed:  3  -- Total:  6 
    🟨 sm
      🟨 60;70;80;90 (66% Fail)       Failed:  2  -- Passed:  1  -- Total:  3 
      🟨 90a (50% Fail)               Failed:  2  -- Passed:  2  -- Total:  4 
    🟨 std
      🟥 11 (100% Fail)               Failed: 26  -- Passed:  0  -- Total: 26 
      🟥 14 (100% Fail)               Failed: 29  -- Passed:  0  -- Total: 29 
      🟩 17 (0% Fail)                 Failed:  0  -- Passed: 28  -- Total: 28 
      🟩 20 (0% Fail)                 Failed:  0  -- Passed: 16  -- Total: 16 
    🟨 gpu
      🟨 v100 (55% Fail)              Failed: 55  -- Passed: 44  -- Total: 99 
    

🏃‍ Runner counts (total jobs: 198)

# Runner
154 linux-amd64-cpu16
16 linux-arm64-cpu16
16 linux-amd64-gpu-v100-latest-1
12 windows-amd64-cpu16

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

// Include our NVTX3 C++ wrapper if not available from the CTK
# if __has_include(<nvtx3/nvtx3.hpp>) // TODO(bgruber): replace by a check for the first CTK version shipping the header
# include <nvtx3/nvtx3.hpp>
# else // __has_include(<nvtx3/nvtx3.hpp>)
# include "nvtx3.hpp"
# endif // __has_include(<nvtx3/nvtx3.hpp>)

# include <cuda/std/optional>
// Furthermore, we only support the NVTX3 C++ API V1
# ifdef NVTX3_CPP_DEFINITIONS_V1_0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: do you think it's less likely for NVTX version to change than for user to require explicit ABI version?

We discussed the idea that if NVTX3_CPP_REQUIRE_EXPLICIT_VERSION is defined, we'd disable NVTX support on CUB end. This approach supposedly works when the version is changed on the user side.

This PR goes a different path of binding CUB to a concrete version of NVTX. To me, it seems unlikely that users define NVTX3_CPP_REQUIRE_EXPLICIT_VERSION, so the initial approach seems more compelling. It leads to us not disabling NVTX support on every NVTX version change. Disabling NVTX when explicit version is required also seems easier on the maintenance part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: do you think it's less likely for NVTX version to change than for user to require explicit ABI version?

I think so, but it's hard to say. Using the explicit version is the recommended practice by NVTX3 for header-based libraries. See here: https://github.com/NVIDIA/NVTX/blob/release-v3/c/include/nvtx3/nvtx3.hpp#L32-L38.

 * Since NVTX3_CPP_REQUIRE_EXPLICIT_VERSION allows all combinations of versions
 * to coexist without problems within a translation unit, the recommended best
 * practice for instrumenting header-based libraries with NVTX C++ Wrappers is
 * is to #define NVTX3_CPP_REQUIRE_EXPLICIT_VERSION before including nvtx3.hpp,
 * #undef it afterward, and only use explicit-version symbols.  This is not
 * necessary in common cases, such as instrumenting a standalone application, or
 * static/shared libraries in .cpp files or headers private to those projects.

And it's not only about the user. CCCL could also be mixed with any other library using NVTX3 with explicit versioning. And we ship a fair amouint of libraries with and around the CTK.

If the NVTX major/minor version changes, users would get a warning so we can have another go at this issue when we have more information. We may further discuss this aspect though, since we would not want this warning to trigger forever in case a CCCL with a newer version of NVTX3 would be combined and shipped into the same CTK.

We discussed the idea that if NVTX3_CPP_REQUIRE_EXPLICIT_VERSION is defined, we'd disable NVTX support on CUB end.

I know, and I don't like that approach. It just feels like a usability bug to me. Imagine a user using CCCL and enjoying NVTX ranges in CUB. Then they add an unrelated third-party library, which either defines NVTX3_CPP_REQUIRE_EXPLICIT_VERSION or the user decides themselves to switch to the explicit API to avoid conflicts, and suddenly all NVTX ranges in CUB are gone. If I was that user, I would file a bug report.

I just strongly believe there is a better solution here.

@bernhardmgruber
Copy link
Contributor Author

I may have found the solution. It seems to me that newer (non-V1) versions of NVTX3 will continue to ship the V1 symbols regardless. So if CUB uses the explicit V1 API, we are covered in all situations. I opened the following issue to confirm this: NVIDIA/NVTX#96. Let's wait for the response.

Copy link
Contributor

🟩 CI Results [ Failed: 0 | Passed: 198 | Total: 198 ]
  • 🟩 Project thrust [ Failed: 0 | Passed: 99 | Total: 99 ]

    🟩 cpu
      🟩 amd64 (0% Fail)              Failed:  0  -- Passed: 91  -- Total: 91 
      🟩 arm64 (0% Fail)              Failed:  0  -- Passed:  8  -- Total:  8 
    🟩 ctk
      🟩 11.1 (0% Fail)               Failed:  0  -- Passed: 15  -- Total: 15 
      🟩 11.8 (0% Fail)               Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 12.4 (0% Fail)               Failed:  0  -- Passed: 81  -- Total: 81 
    🟩 cudacxx_full
      🟩 clang-cuda16 (0% Fail)       Failed:  0  -- Passed:  2  -- Total:  2 
      🟩 nvcc11.1 (0% Fail)           Failed:  0  -- Passed: 15  -- Total: 15 
      🟩 nvcc11.8 (0% Fail)           Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 nvcc12.4 (0% Fail)           Failed:  0  -- Passed: 79  -- Total: 79 
    🟩 cudacxx_name
      🟩 clang-cuda (0% Fail)         Failed:  0  -- Passed:  2  -- Total:  2 
      🟩 nvcc (0% Fail)               Failed:  0  -- Passed: 97  -- Total: 97 
    🟩 cxx_full
      🟩 clang9 (0% Fail)             Failed:  0  -- Passed:  6  -- Total:  6 
      🟩 clang10 (0% Fail)            Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 clang11 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang12 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang13 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang14 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang15 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang16 (0% Fail)            Failed:  0  -- Passed: 14  -- Total: 14 
      🟩 gcc6 (0% Fail)               Failed:  0  -- Passed:  2  -- Total:  2 
      🟩 gcc7 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
      🟩 gcc8 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
      🟩 gcc9 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
      🟩 gcc10 (0% Fail)              Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 gcc11 (0% Fail)              Failed:  0  -- Passed:  7  -- Total:  7 
      🟩 gcc12 (0% Fail)              Failed:  0  -- Passed: 16  -- Total: 16 
      🟩 Intel2023.2.0 (0% Fail)      Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 MSVC14.16 (0% Fail)          Failed:  0  -- Passed:  1  -- Total:  1 
      🟩 MSVC14.29 (0% Fail)          Failed:  0  -- Passed:  2  -- Total:  2 
      🟩 MSVC14.39 (0% Fail)          Failed:  0  -- Passed:  3  -- Total:  3 
    🟩 cxx_name
      🟩 clang (0% Fail)              Failed:  0  -- Passed: 43  -- Total: 43 
      🟩 gcc (0% Fail)                Failed:  0  -- Passed: 47  -- Total: 47 
      🟩 Intel (0% Fail)              Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 MSVC (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
    🟩 gpu
      🟩 v100 (0% Fail)               Failed:  0  -- Passed: 99  -- Total: 99 
    🟩 jobs
      🟩 build (0% Fail)              Failed:  0  -- Passed: 91  -- Total: 91 
      🟩 test (0% Fail)               Failed:  0  -- Passed:  8  -- Total:  8 
    🟩 os
      🟩 ubuntu18.04 (0% Fail)        Failed:  0  -- Passed: 14  -- Total: 14 
      🟩 ubuntu20.04 (0% Fail)        Failed:  0  -- Passed: 35  -- Total: 35 
      🟩 ubuntu22.04 (0% Fail)        Failed:  0  -- Passed: 44  -- Total: 44 
      🟩 windows2022 (0% Fail)        Failed:  0  -- Passed:  6  -- Total:  6 
    🟩 sm
      🟩 60;70;80;90 (0% Fail)        Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 90a (0% Fail)                Failed:  0  -- Passed:  4  -- Total:  4 
    🟩 std
      🟩 11 (0% Fail)                 Failed:  0  -- Passed: 26  -- Total: 26 
      🟩 14 (0% Fail)                 Failed:  0  -- Passed: 29  -- Total: 29 
      🟩 17 (0% Fail)                 Failed:  0  -- Passed: 28  -- Total: 28 
      🟩 20 (0% Fail)                 Failed:  0  -- Passed: 16  -- Total: 16 
    
  • 🟩 Project cub [ Failed: 0 | Passed: 99 | Total: 99 ]

    🟩 cpu
      🟩 amd64 (0% Fail)              Failed:  0  -- Passed: 91  -- Total: 91 
      🟩 arm64 (0% Fail)              Failed:  0  -- Passed:  8  -- Total:  8 
    🟩 ctk
      🟩 11.1 (0% Fail)               Failed:  0  -- Passed: 15  -- Total: 15 
      🟩 11.8 (0% Fail)               Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 12.4 (0% Fail)               Failed:  0  -- Passed: 81  -- Total: 81 
    🟩 cudacxx_full
      🟩 clang-cuda16 (0% Fail)       Failed:  0  -- Passed:  2  -- Total:  2 
      🟩 nvcc11.1 (0% Fail)           Failed:  0  -- Passed: 15  -- Total: 15 
      🟩 nvcc11.8 (0% Fail)           Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 nvcc12.4 (0% Fail)           Failed:  0  -- Passed: 79  -- Total: 79 
    🟩 cudacxx_name
      🟩 clang-cuda (0% Fail)         Failed:  0  -- Passed:  2  -- Total:  2 
      🟩 nvcc (0% Fail)               Failed:  0  -- Passed: 97  -- Total: 97 
    🟩 cxx_full
      🟩 clang9 (0% Fail)             Failed:  0  -- Passed:  6  -- Total:  6 
      🟩 clang10 (0% Fail)            Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 clang11 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang12 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang13 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang14 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang15 (0% Fail)            Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 clang16 (0% Fail)            Failed:  0  -- Passed: 14  -- Total: 14 
      🟩 gcc6 (0% Fail)               Failed:  0  -- Passed:  2  -- Total:  2 
      🟩 gcc7 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
      🟩 gcc8 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
      🟩 gcc9 (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
      🟩 gcc10 (0% Fail)              Failed:  0  -- Passed:  4  -- Total:  4 
      🟩 gcc11 (0% Fail)              Failed:  0  -- Passed:  7  -- Total:  7 
      🟩 gcc12 (0% Fail)              Failed:  0  -- Passed: 16  -- Total: 16 
      🟩 Intel2023.2.0 (0% Fail)      Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 MSVC14.16 (0% Fail)          Failed:  0  -- Passed:  1  -- Total:  1 
      🟩 MSVC14.29 (0% Fail)          Failed:  0  -- Passed:  2  -- Total:  2 
      🟩 MSVC14.39 (0% Fail)          Failed:  0  -- Passed:  3  -- Total:  3 
    🟩 cxx_name
      🟩 clang (0% Fail)              Failed:  0  -- Passed: 43  -- Total: 43 
      🟩 gcc (0% Fail)                Failed:  0  -- Passed: 47  -- Total: 47 
      🟩 Intel (0% Fail)              Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 MSVC (0% Fail)               Failed:  0  -- Passed:  6  -- Total:  6 
    🟩 gpu
      🟩 v100 (0% Fail)               Failed:  0  -- Passed: 99  -- Total: 99 
    🟩 jobs
      🟩 build (0% Fail)              Failed:  0  -- Passed: 91  -- Total: 91 
      🟩 test (0% Fail)               Failed:  0  -- Passed:  8  -- Total:  8 
    🟩 os
      🟩 ubuntu18.04 (0% Fail)        Failed:  0  -- Passed: 14  -- Total: 14 
      🟩 ubuntu20.04 (0% Fail)        Failed:  0  -- Passed: 35  -- Total: 35 
      🟩 ubuntu22.04 (0% Fail)        Failed:  0  -- Passed: 44  -- Total: 44 
      🟩 windows2022 (0% Fail)        Failed:  0  -- Passed:  6  -- Total:  6 
    🟩 sm
      🟩 60;70;80;90 (0% Fail)        Failed:  0  -- Passed:  3  -- Total:  3 
      🟩 90a (0% Fail)                Failed:  0  -- Passed:  4  -- Total:  4 
    🟩 std
      🟩 11 (0% Fail)                 Failed:  0  -- Passed: 26  -- Total: 26 
      🟩 14 (0% Fail)                 Failed:  0  -- Passed: 29  -- Total: 29 
      🟩 17 (0% Fail)                 Failed:  0  -- Passed: 28  -- Total: 28 
      🟩 20 (0% Fail)                 Failed:  0  -- Passed: 16  -- Total: 16 
    

🏃‍ Runner counts (total jobs: 198)

# Runner
154 linux-amd64-cpu16
16 linux-arm64-cpu16
16 linux-amd64-gpu-v100-latest-1
12 windows-amd64-cpu16

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

@bernhardmgruber bernhardmgruber added cub For all items related to CUB bug Something isn't working right. labels May 20, 2024
Copy link
Contributor

github-actions bot commented Jun 4, 2024

🟩 CI finished in 2h 54m: Pass: 100%/249 | Total: 3d 22h | Avg: 22m 44s | Max: 55m 21s | Hits: 75%/248441
  • 🟩 cub: Pass: 100%/131 | Total: 2d 05h | Avg: 24m 41s | Max: 45m 34s | Hits: 74%/109175

    🟩 cpu
      🟩 amd64              Pass: 100%/123 | Total:  2d 02h | Avg: 24m 31s | Max: 45m 34s | Hits:  74%/102359
      🟩 arm64              Pass: 100%/8   | Total:  3h 38m | Avg: 27m 18s | Max: 34m 57s | Hits:  64%/6816  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  4h 50m | Avg: 19m 20s | Max: 45m 17s | Hits:  69%/11569 
      🟩 11.8               Pass: 100%/3   | Total:  1h 35m | Avg: 31m 52s | Max: 43m 52s | Hits:  67%/2556  
      🟩 12.4               Pass: 100%/113 | Total:  1d 23h | Avg: 25m 12s | Max: 45m 34s | Hits:  74%/95050 
    🟩 cudacxx_full
      🟩 clang-cuda17       Pass: 100%/2   | Total: 39m 50s | Avg: 19m 55s | Max: 20m 37s | Hits:  58%/1410  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  4h 50m | Avg: 19m 20s | Max: 45m 17s | Hits:  69%/11569 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 35m | Avg: 31m 52s | Max: 43m 52s | Hits:  67%/2556  
      🟩 nvcc12.4           Pass: 100%/111 | Total:  1d 22h | Avg: 25m 18s | Max: 45m 34s | Hits:  75%/93640 
    🟩 cudacxx_name
      🟩 clang-cuda         Pass: 100%/2   | Total: 39m 50s | Avg: 19m 55s | Max: 20m 37s | Hits:  58%/1410  
      🟩 nvcc               Pass: 100%/129 | Total:  2d 05h | Avg: 24m 46s | Max: 45m 34s | Hits:  74%/107765
    🟩 cxx_full
      🟩 clang9             Pass: 100%/6   | Total:  2h 07m | Avg: 21m 15s | Max: 33m 49s | Hits:  68%/4890  
      🟩 clang10            Pass: 100%/3   | Total:  1h 14m | Avg: 24m 56s | Max: 33m 34s | Hits:  68%/2562  
      🟩 clang11            Pass: 100%/4   | Total:  1h 46m | Avg: 26m 37s | Max: 34m 59s | Hits:  64%/3416  
      🟩 clang12            Pass: 100%/4   | Total:  1h 48m | Avg: 27m 01s | Max: 34m 32s | Hits:  64%/3416  
      🟩 clang13            Pass: 100%/4   | Total:  1h 47m | Avg: 26m 55s | Max: 34m 22s | Hits:  64%/3416  
      🟩 clang14            Pass: 100%/4   | Total:  1h 49m | Avg: 27m 17s | Max: 35m 14s | Hits:  64%/3416  
      🟩 clang15            Pass: 100%/4   | Total:  1h 48m | Avg: 27m 12s | Max: 35m 08s | Hits:  64%/3408  
      🟩 clang16            Pass: 100%/4   | Total:  1h 51m | Avg: 27m 52s | Max: 35m 41s | Hits:  64%/3408  
      🟩 clang17            Pass: 100%/26  | Total: 10h 25m | Avg: 24m 02s | Max: 41m 20s | Hits:  86%/21858 
      🟩 gcc6               Pass: 100%/2   | Total: 28m 29s | Avg: 14m 14s | Max: 25m 08s | Hits:  76%/1552  
      🟩 gcc7               Pass: 100%/6   | Total:  2h 06m | Avg: 21m 05s | Max: 32m 18s | Hits:  68%/4893  
      🟩 gcc8               Pass: 100%/6   | Total:  2h 10m | Avg: 21m 40s | Max: 36m 09s | Hits:  68%/4893  
      🟩 gcc9               Pass: 100%/6   | Total:  2h 13m | Avg: 22m 13s | Max: 35m 12s | Hits:  68%/4893  
      🟩 gcc10              Pass: 100%/4   | Total:  1h 48m | Avg: 27m 09s | Max: 34m 36s | Hits:  64%/3416  
      🟩 gcc11              Pass: 100%/7   | Total:  3h 26m | Avg: 29m 27s | Max: 43m 52s | Hits:  65%/5964  
      🟩 gcc12              Pass: 100%/4   | Total:  1h 51m | Avg: 27m 54s | Max: 34m 20s | Hits:  64%/3408  
      🟩 gcc13              Pass: 100%/28  | Total:  9h 29m | Avg: 20m 20s | Max: 37m 14s | Hits:  84%/23856 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 19m | Avg: 26m 27s | Max: 37m 41s | Hits:  68%/2334  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 45m 17s | Avg: 45m 17s | Max: 45m 17s | Hits:  56%/696   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 23m | Avg: 41m 57s | Max: 43m 35s | Hits:  56%/1392  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 12m | Avg: 44m 09s | Max: 45m 34s | Hits:  56%/2088  
    🟩 cxx_name
      🟩 clang              Pass: 100%/59  | Total:  1d 00h | Avg: 25m 04s | Max: 41m 20s | Hits:  74%/49790 
      🟩 gcc                Pass: 100%/63  | Total: 23h 34m | Avg: 22m 27s | Max: 43m 52s | Hits:  75%/52875 
      🟩 Intel              Pass: 100%/3   | Total:  1h 19m | Avg: 26m 27s | Max: 37m 41s | Hits:  68%/2334  
      🟩 MSVC               Pass: 100%/6   | Total:  4h 21m | Avg: 43m 36s | Max: 45m 34s | Hits:  56%/4176  
    🟩 gpu
      🟩 v100               Pass: 100%/131 | Total:  2d 05h | Avg: 24m 41s | Max: 45m 34s | Hits:  74%/109175
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  1d 18h | Avg: 26m 01s | Max: 45m 34s | Hits:  65%/81911 
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 41m | Avg: 20m 07s | Max: 34m 50s | Hits:  99%/6816  
      🟩 GraphCapture       Pass: 100%/8   | Total:  1h 56m | Avg: 14m 34s | Max: 18m 33s | Hits:  99%/6816  
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 38m | Avg: 19m 45s | Max: 29m 53s | Hits:  99%/6816  
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 42m | Avg: 27m 50s | Max: 41m 20s | Hits:  99%/6816  
    🟩 os
      🟩 ubuntu18.04        Pass: 100%/14  | Total:  4h 04m | Avg: 17m 29s | Max: 27m 08s | Hits:  69%/10873 
      🟩 ubuntu20.04        Pass: 100%/35  | Total: 15h 15m | Avg: 26m 10s | Max: 36m 09s | Hits:  66%/29890 
      🟩 ubuntu22.04        Pass: 100%/76  | Total:  1d 06h | Avg: 23m 50s | Max: 43m 52s | Hits:  79%/64236 
      🟩 windows2022        Pass: 100%/6   | Total:  4h 21m | Avg: 43m 36s | Max: 45m 34s | Hits:  56%/4176  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 35m | Avg: 31m 52s | Max: 43m 52s | Hits:  67%/2556  
      🟩 90a                Pass: 100%/4   | Total: 59m 21s | Avg: 14m 50s | Max: 18m 44s | Hits:  64%/3408  
    🟩 std
      🟩 11                 Pass: 100%/34  | Total:  5h 29m | Avg:  9m 41s | Max: 24m 40s | Hits:  97%/28537 
      🟩 14                 Pass: 100%/37  | Total: 19h 03m | Avg: 30m 54s | Max: 45m 17s | Hits:  64%/30625 
      🟩 17                 Pass: 100%/36  | Total: 17h 48m | Avg: 29m 40s | Max: 43m 14s | Hits:  64%/29858 
      🟩 20                 Pass: 100%/24  | Total: 11h 33m | Avg: 28m 54s | Max: 45m 34s | Hits:  69%/20155 
    
  • 🟩 thrust: Pass: 100%/118 | Total: 1d 16h | Avg: 20m 34s | Max: 55m 21s | Hits: 75%/139266

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total:  1d 13h | Avg: 20m 37s | Max: 55m 21s | Hits:  76%/129822
      🟩 arm64              Pass: 100%/8   | Total:  2h 39m | Avg: 19m 54s | Max: 26m 59s | Hits:  71%/9444  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  4h 39m | Avg: 18m 37s | Max: 45m 33s | Hits:  74%/17705 
      🟩 11.8               Pass: 100%/3   | Total:  1h 12m | Avg: 24m 00s | Max: 35m 01s | Hits:  74%/3543  
      🟩 12.4               Pass: 100%/100 | Total:  1d 10h | Avg: 20m 46s | Max: 55m 21s | Hits:  76%/118018
    🟩 cudacxx_full
      🟩 clang-cuda17       Pass: 100%/2   | Total: 46m 58s | Avg: 23m 29s | Max: 24m 20s | Hits:  62%/2360  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  4h 39m | Avg: 18m 37s | Max: 45m 33s | Hits:  74%/17705 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 12m | Avg: 24m 00s | Max: 35m 01s | Hits:  74%/3543  
      🟩 nvcc12.4           Pass: 100%/98  | Total:  1d 09h | Avg: 20m 43s | Max: 55m 21s | Hits:  76%/115658
    🟩 cudacxx_name
      🟩 clang-cuda         Pass: 100%/2   | Total: 46m 58s | Avg: 23m 29s | Max: 24m 20s | Hits:  62%/2360  
      🟩 nvcc               Pass: 100%/116 | Total:  1d 15h | Avg: 20m 31s | Max: 55m 21s | Hits:  76%/136906
    🟩 cxx_full
      🟩 clang9             Pass: 100%/6   | Total:  1h 52m | Avg: 18m 40s | Max: 28m 58s | Hits:  75%/7080  
      🟩 clang10            Pass: 100%/3   | Total:  1h 03m | Avg: 21m 09s | Max: 29m 49s | Hits:  75%/3540  
      🟩 clang11            Pass: 100%/4   | Total:  1h 24m | Avg: 21m 02s | Max: 27m 36s | Hits:  72%/4720  
      🟩 clang12            Pass: 100%/4   | Total:  1h 27m | Avg: 21m 48s | Max: 29m 59s | Hits:  72%/4720  
      🟩 clang13            Pass: 100%/4   | Total:  1h 24m | Avg: 21m 00s | Max: 29m 49s | Hits:  72%/4720  
      🟩 clang14            Pass: 100%/4   | Total:  1h 22m | Avg: 20m 30s | Max: 27m 01s | Hits:  72%/4720  
      🟩 clang15            Pass: 100%/4   | Total:  1h 20m | Avg: 20m 11s | Max: 26m 12s | Hits:  72%/4720  
      🟩 clang16            Pass: 100%/4   | Total:  1h 22m | Avg: 20m 30s | Max: 26m 27s | Hits:  72%/4720  
      🟩 clang17            Pass: 100%/18  | Total:  4h 44m | Avg: 15m 49s | Max: 29m 39s | Hits:  83%/21240 
      🟩 gcc6               Pass: 100%/2   | Total: 26m 47s | Avg: 13m 23s | Max: 24m 09s | Hits:  81%/2360  
      🟩 gcc7               Pass: 100%/6   | Total:  1h 49m | Avg: 18m 17s | Max: 28m 26s | Hits:  74%/7086  
      🟩 gcc8               Pass: 100%/6   | Total:  1h 50m | Avg: 18m 20s | Max: 28m 12s | Hits:  74%/7086  
      🟩 gcc9               Pass: 100%/6   | Total:  2h 17m | Avg: 22m 52s | Max: 28m 47s | Hits:  62%/7086  
      🟩 gcc10              Pass: 100%/4   | Total:  1h 32m | Avg: 23m 03s | Max: 31m 28s | Hits:  66%/4724  
      🟩 gcc11              Pass: 100%/7   | Total:  2h 41m | Avg: 23m 04s | Max: 35m 01s | Hits:  73%/8267  
      🟩 gcc12              Pass: 100%/4   | Total:  1h 31m | Avg: 22m 57s | Max: 31m 47s | Hits:  71%/4724  
      🟩 gcc13              Pass: 100%/20  | Total:  5h 08m | Avg: 15m 24s | Max: 28m 59s | Hits:  83%/23620 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 12m | Avg: 24m 08s | Max: 34m 46s | Hits:  75%/3549  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 45m 33s | Avg: 45m 33s | Max: 45m 33s | Hits:  61%/1176  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 37m | Avg: 48m 48s | Max: 49m 32s | Hits:  61%/2352  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  3h 34m | Avg: 35m 49s | Max: 55m 21s | Hits:  80%/7056  
    🟩 cxx_name
      🟩 clang              Pass: 100%/51  | Total: 16h 00m | Avg: 18m 50s | Max: 29m 59s | Hits:  76%/60180 
      🟩 gcc                Pass: 100%/55  | Total: 17h 17m | Avg: 18m 51s | Max: 35m 01s | Hits:  75%/64953 
      🟩 Intel              Pass: 100%/3   | Total:  1h 12m | Avg: 24m 08s | Max: 34m 46s | Hits:  75%/3549  
      🟩 MSVC               Pass: 100%/9   | Total:  5h 58m | Avg: 39m 47s | Max: 55m 21s | Hits:  74%/10584 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total:  1d 16h | Avg: 20m 34s | Max: 55m 21s | Hits:  75%/139266
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  1d 12h | Avg: 22m 21s | Max: 55m 21s | Hits:  71%/116850
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 41m | Avg:  9m 13s | Max: 20m 23s | Hits:  99%/12972 
      🟩 TestGPU            Pass: 100%/8   | Total:  1h 53m | Avg: 14m 12s | Max: 22m 21s | Hits:  99%/9444  
    🟩 os
      🟩 ubuntu18.04        Pass: 100%/14  | Total:  3h 53m | Avg: 16m 41s | Max: 26m 10s | Hits:  75%/16529 
      🟩 ubuntu20.04        Pass: 100%/35  | Total: 12h 35m | Avg: 21m 34s | Max: 31m 28s | Hits:  70%/41313 
      🟩 ubuntu22.04        Pass: 100%/60  | Total: 18h 01m | Avg: 18m 01s | Max: 35m 01s | Hits:  79%/70840 
      🟩 windows2022        Pass: 100%/9   | Total:  5h 58m | Avg: 39m 47s | Max: 55m 21s | Hits:  74%/10584 
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 12m | Avg: 24m 00s | Max: 35m 01s | Hits:  74%/3543  
      🟩 90a                Pass: 100%/4   | Total: 51m 32s | Avg: 12m 53s | Max: 17m 17s | Hits:  71%/4724  
    🟩 std
      🟩 11                 Pass: 100%/30  | Total:  2h 22m | Avg:  4m 45s | Max: 27m 19s | Hits:  97%/35418 
      🟩 14                 Pass: 100%/34  | Total: 14h 56m | Avg: 26m 21s | Max: 51m 56s | Hits:  67%/40122 
      🟩 17                 Pass: 100%/33  | Total: 14h 39m | Avg: 26m 39s | Max: 55m 21s | Hits:  68%/38946 
      🟩 20                 Pass: 100%/21  | Total:  8h 29m | Avg: 24m 16s | Max: 51m 57s | Hits:  70%/24780 
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

🏃‍ Runner counts (total jobs: 249)

# Runner
178 linux-amd64-cpu16
40 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right. cub For all items related to CUB
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

CUB's NVTX ranges fail to compile when usercode uses explicitly versioned NVTX API
2 participants