Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing Kokkos::SYCL Unit Tests #7007

Open
pvelesko opened this issue May 13, 2024 · 8 comments
Open

Failing Kokkos::SYCL Unit Tests #7007

pvelesko opened this issue May 13, 2024 · 8 comments
Labels
Backend - SYCL Question For Kokkos internal and external contributors and users

Comments

@pvelesko
Copy link

I see lots of timeouts and failure to find kernel

29% tests passed, 37 tests failed out of 52

Total Test time (real) = 1800.46 sec

The following tests FAILED:
	  1 - Kokkos_CoreUnitTest_Serial1 (Failed)
	  4 - Kokkos_CoreUnitTest_SYCL1A (Timeout)
	  5 - Kokkos_CoreUnitTest_SYCL1B (Timeout)
	  6 - Kokkos_CoreUnitTest_SYCL2A (Timeout)
	  7 - Kokkos_CoreUnitTest_SYCL2B (Timeout)
	  8 - Kokkos_CoreUnitTest_SYCL2C (Failed)
	  9 - Kokkos_CoreUnitTest_SYCL2D (Failed)
	 10 - Kokkos_CoreUnitTest_SYCL3 (Timeout)
	 11 - Kokkos_CoreUnitTest_SYCLInterOpInit (Failed)
	 12 - Kokkos_CoreUnitTest_SYCLInterOpInit_Context (Failed)
	 13 - Kokkos_CoreUnitTest_SYCLInterOpStreams (Failed)
	 14 - Kokkos_CoreUnitTest_Default (Timeout)
	 15 - Kokkos_CoreUnitTest_LegionInitialization (Failed)
	 18 - Kokkos_CoreUnitTest_KokkosP (Failed)
	 26 - Kokkos_IncrementalTest_SYCL (Timeout)
	 31 - Kokkos_ContainersUnitTest_SYCL (Timeout)
	 32 - Kokkos_UnitTest_Sort (Timeout)
	 33 - Kokkos_UnitTest_Random (Failed)
	 34 - Kokkos_AlgorithmsUnitTest_StdSet_A (Timeout)
	 35 - Kokkos_AlgorithmsUnitTest_StdSet_B (Timeout)
	 36 - Kokkos_AlgorithmsUnitTest_StdSet_C (Timeout)
	 37 - Kokkos_AlgorithmsUnitTest_StdSet_D (Timeout)
	 38 - Kokkos_AlgorithmsUnitTest_StdSet_E (Timeout)
	 39 - Kokkos_AlgorithmsUnitTest_StdSet_Team_A (Timeout)
	 40 - Kokkos_AlgorithmsUnitTest_StdSet_Team_B (Timeout)
	 41 - Kokkos_AlgorithmsUnitTest_StdSet_Team_C (Timeout)
	 42 - Kokkos_AlgorithmsUnitTest_StdSet_Team_D (Failed)
	 43 - Kokkos_AlgorithmsUnitTest_StdSet_Team_E (Timeout)
	 44 - Kokkos_AlgorithmsUnitTest_StdSet_Team_F (Failed)
	 45 - Kokkos_AlgorithmsUnitTest_StdSet_Team_G (Failed)
	 46 - Kokkos_AlgorithmsUnitTest_StdSet_Team_H (Timeout)
	 47 - Kokkos_AlgorithmsUnitTest_StdSet_Team_I (Failed)
	 48 - Kokkos_AlgorithmsUnitTest_StdSet_Team_L (Timeout)
	 49 - Kokkos_AlgorithmsUnitTest_StdSet_Team_M (Timeout)
	 50 - Kokkos_AlgorithmsUnitTest_StdSet_Team_P (Failed)
	 51 - Kokkos_AlgorithmsUnitTest_StdSet_Team_Q (Failed)
	 52 - Kokkos_UnitTest_SIMD (Failed)

Please include the following for a minimal reproducer

  1. Compilers (with versions)
    OneAPI 2024.1 icpx

  2. Kokkos release or commit used (i.e., the sha1 number)
    tag 4.3.0

  3. Platform, architecture and backend
    Intel A770 Discrete GPU

  4. CMake configure command

export KOKKOS_DIR=~/kokkos-build/kokkos
export KOKKOS_KERNELS_DIR=~/kokkos-build/kokkos-kernels
export KOKKOS_VER=4.3.00
export ONEAPI_VER=2024.1.0
export PREFIX=/space/pvelesko/install/kokkos/${KOKKOS_VER}/oneapi/$ONEAPI_VER
module purge
module load oneapi/$ONEAPI_VER

rm -rf ${KOKKOS_DIR}/build && mkdir -p ${KOKKOS_DIR}/build && cd ${KOKKOS_DIR}/build && rm -f CMakeCache.txt
git checkout HEAD -f && git checkout ${KOKKOS_VER}
cmake -DKokkos_ENABLE_SYCL=ON \
-DCMAKE_CXX_COMPILER=icpx \
-DBUILD_SHARED_LIBS=ON \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DKokkos_ENABLE_TESTS=ON \
-DCMAKE_INSTALL_PREFIX=${PREFIX} ..
ninja install

  1. Output from CMake configure command
─pvelesko@cupcake ~/kokkos-build/kokkos/build ‹4.3.00●›
╰─$ export KOKKOS_DIR=~/kokkos-build/kokkos
export KOKKOS_KERNELS_DIR=~/kokkos-build/kokkos-kernels
export KOKKOS_VER=4.3.00
export ONEAPI_VER=2024.1.0
export PREFIX=/space/pvelesko/install/kokkos/${KOKKOS_VER}/oneapi/$ONEAPI_VER
module purge
module load oneapi/$ONEAPI_VER

rm -rf ${KOKKOS_DIR}/build && mkdir -p ${KOKKOS_DIR}/build && cd ${KOKKOS_DIR}/build && rm -f CMakeCache.txt
git checkout HEAD -f && git checkout ${KOKKOS_VER}
cmake -DKokkos_ENABLE_SYCL=ON \
-DCMAKE_CXX_COMPILER=icpx \
-DBUILD_SHARED_LIBS=ON \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DKokkos_ENABLE_TESTS=ON \
-DCMAKE_INSTALL_PREFIX=${PREFIX} ..
Loading oneapi/2024.1.0
  Loading requirement: opencl/ocl-icd-loader
HEAD is now at 486cc745c Merge pull request #6908 from ndellingwood/master-release-4.3.00
-- Setting default Kokkos CXX standard to 17
-- The CXX compiler identification is IntelLLVM 2024.1.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /home/pvelesko/miniconda3/envs/oneapi-2024.1.0/bin/icpx - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Kokkos version: 4.3.0
-- The project name is: Kokkos
-- Using gtest found in /usr/lib/x86_64-linux-gnu/cmake/GTest
-- Configured git information in /home/pvelesko/kokkos-build/kokkos/build/generated/Kokkos_Version_Info.cpp
-- SERIAL backend is being turned on to ensure there is at least one Host space. To change this, you must enable another host execution space and configure with -DKokkos_ENABLE_SERIAL=OFF or change CMakeCache.txt
-- Using -std=gnu++17 for C++17 extensions as feature
-- Looking for SYCL_EXT_ONEAPI_DEVICE_GLOBAL
-- Looking for SYCL_EXT_ONEAPI_DEVICE_GLOBAL - found
-- Built-in Execution Spaces:
--     Device Parallel: Kokkos::Experimental::SYCL
--     Host Parallel: NoTypeDefined
--       Host Serial: SERIAL
--
-- Architectures:
-- Found TPLLIBDL: /usr/include
-- Looking for C++ include oneapi/dpl/execution
-- Looking for C++ include oneapi/dpl/execution - found
-- Looking for C++ include oneapi/dpl/algorithm
-- Looking for C++ include oneapi/dpl/algorithm - found
-- Performing Test KOKKOS_NO_TBB_CONFLICT
-- Performing Test KOKKOS_NO_TBB_CONFLICT - Success
-- Using internal desul_atomics copy
-- Found Python3: /usr/bin/python3.10 (found version "3.10.12") found components: Interpreter
-- Kokkos Backends: SERIAL;SYCL
-- Configuring done
-- Generating done
-- Build files have been written to: /home/pvelesko/kokkos-build/kokkos/build
  1. Minimum, complete code needed to reproduce the bug

  2. Command line needed to reproduce the bug

  3. KokkosCore_config.h header file (generated during the build)
    KokkosCore_config.txt

  4. Please provide any additional relevant error logs
    LastTest.txt

@ajpowelsnl
Copy link
Contributor

ajpowelsnl commented May 13, 2024

@pvelesko - what is the full output when you run a single, failing test (e.g., Kokkos_CoreUnitTest_Serial1) ?

@ajpowelsnl ajpowelsnl added Question For Kokkos internal and external contributors and users Backend - SYCL labels May 13, 2024
@masterleinad
Copy link
Contributor

What do the results look like if you explicitly provide the target architecture?

@pvelesko
Copy link
Author

@pvelesko - what is the full output when you run a single, failing test (e.g., Kokkos_CoreUnitTest_Serial1) ?

I provided the full log output in the original post.

What do the results look like if you explicitly provide the target architecture?

With -DKokkos_ARCH_INTEL_GEN=ON - This flag seems to help a lot but I would have assumed that JIT is default. Not sure what Kokkos tries to do when this flag is not explicitly specified?

83% tests passed, 9 tests failed out of 52

Total Test time (real) = 628.25 sec

The following tests FAILED:
	  1 - Kokkos_CoreUnitTest_Serial1 (Failed)
	  4 - Kokkos_CoreUnitTest_SYCL1A (Subprocess aborted)
	  5 - Kokkos_CoreUnitTest_SYCL1B (Failed)
	  6 - Kokkos_CoreUnitTest_SYCL2A (Failed)
	 10 - Kokkos_CoreUnitTest_SYCL3 (Failed)
	 14 - Kokkos_CoreUnitTest_Default (Timeout)
	 29 - Kokkos_CoreUnitTest_DeviceAndThreads (Failed)
	 31 - Kokkos_ContainersUnitTest_SYCL (Subprocess killed)
	 47 - Kokkos_AlgorithmsUnitTest_StdSet_Team_I (Failed)

LastTest-DDKokkos_ARCH_INTEL_GEN.txt

Tried Kokkos_ARCH_INTEL_XEHP but it failed to compile

Could not determine device target: 12.50.4.
Error: Cannot get HW Info for device 12.50.4.

A770 is Xe HPG - not sure if that was correct.
https://www.intel.com/content/www/us/en/products/sku/229151/intel-arc-a770-graphics-16gb/specifications.html

Using Kokkos_ARCH_INTEL_DG1 was compiling but after multiple hours it seems to have stopped making progress.. or is taking forever.

Build succeeded.
[778/779] Linking CXX executable core/unit_test/Kokkos_CoreUnitTest_SYCL1A
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.





@masterleinad
Copy link
Contributor

With -DKokkos_ARCH_INTEL_GEN=ON - This flag seems to help a lot but I would have assumed that JIT is default. Not sure what Kokkos tries to do when this flag is not explicitly specified?

It's not passing a flag which should be equivalent to requesting JIT compilation (where the latter is explicit about compiling to SPIR-V).

Tried Kokkos_ARCH_INTEL_XEHP but it failed to compile

We might need to update the flags then. The ones we are using were recommended on the testbeds some time ago but I guess the AOT compiler can now handle named options. For reference, the A750 seems to be a dg2(-g12?) in the xe-hpg family in ocloc speak, see https://github.com/intel/compute-runtime/blob/014720fc29c59432188a49bebe1aec5aecb5d4f0/shared/source/dll/devices/devices_base.inl#L56.

@masterleinad
Copy link
Contributor

LastTest-DDKokkos_ARCH_INTEL_GEN.txt

The list of failing tests is:

[ FAILED ] serial.atomic_operations_double (0 ms)
[ FAILED ] serial.atomic_operations_float (0 ms)
[ FAILED ] 2 tests, listed below:
[ FAILED ] serial.atomic_operations_double
[ FAILED ] serial.atomic_operations_float
2 FAILED TESTS
Error regular expression found in output. Regex=[ FAILED ]
[ FAILED ] sycl.atomic_operations_complexdouble (2481 ms)
[ FAILED ] sycl.atomic_operations_double (2344 ms)
[ FAILED ] sycl.atomic_operations_float (3 ms)
Error regular expression found in output. Regex=[ FAILED ]
[ FAILED ] sycl_host_usm.view_allocation_large_rank (0 ms)
[ FAILED ] 1 test, listed below:
[ FAILED ] sycl_host_usm.view_allocation_large_rank
1 FAILED TEST
Error regular expression found in output. Regex=[ FAILED ]
FAILED (errors=1, skipped=1)
[ FAILED ] sycl.reducers_int8_t (9 ms)
[ FAILED ] sycl.reducers_point_t (6 ms)
[ FAILED ] sycl.reducers_bool (1052 ms)
[ FAILED ] sycl.int_combined_reduce (1924 ms)
[ FAILED ] sycl.mdrange_combined_reduce (0 ms)
[ FAILED ] sycl.int_combined_reduce_mixed (0 ms)
[ FAILED ] sycl.reduction_deduction (0 ms)
[ FAILED ] sycl.reduce_device_view_range_policy (7283 ms)
[ FAILED ] sycl.reduce_device_view_mdrange_policy (2361 ms)
[ FAILED ] sycl.reduce_device_view_team_policy (2268 ms)
[ FAILED ] 10 tests, listed below:
[ FAILED ] sycl.reducers_int8_t
[ FAILED ] sycl.reducers_point_t
[ FAILED ] sycl.reducers_bool
[ FAILED ] sycl.int_combined_reduce
[ FAILED ] sycl.mdrange_combined_reduce
[ FAILED ] sycl.int_combined_reduce_mixed
[ FAILED ] sycl.reduction_deduction
[ FAILED ] sycl.reduce_device_view_range_policy
[ FAILED ] sycl.reduce_device_view_mdrange_policy
[ FAILED ] sycl.reduce_device_view_team_policy
10 FAILED TESTS
Error regular expression found in output. Regex=[ FAILED ]
[ FAILED ] sycl.TeamThreadMDRangeParallelReduce (21 ms)
[ FAILED ] sycl.ThreadVectorMDRangeParallelReduce (13 ms)
[ FAILED ] sycl.TeamVectorMDRangeParallelReduce (13 ms)
[ FAILED ] sycl.multi_level_scratch (3504 ms)
FAILED teamvector_parallel_reduce 0 0 54103.000000 0.000000 24
FAILED teamvector_parallel_reduce with shared result 0 0 54103.000000 0.000000 24
[ FAILED ] sycl.team_teamvector_range (3175 ms)
[ FAILED ] sycl.view_allocation_large_rank (0 ms)
[ FAILED ] 6 tests, listed below:
[ FAILED ] sycl.TeamThreadMDRangeParallelReduce
[ FAILED ] sycl.ThreadVectorMDRangeParallelReduce
[ FAILED ] sycl.TeamVectorMDRangeParallelReduce
[ FAILED ] sycl.multi_level_scratch
[ FAILED ] sycl.team_teamvector_range
[ FAILED ] sycl.view_allocation_large_rank
6 FAILED TESTS
Error regular expression found in output. Regex=[ FAILED ]
[ FAILED ] std_algorithms_reduce_team_test.test (5732 ms)
[ FAILED ] std_algorithms_transform_reduce_team_test.test (5333 ms)
[ FAILED ] 2 tests, listed below:
[ FAILED ] std_algorithms_reduce_team_test.test
[ FAILED ] std_algorithms_transform_reduce_team_test.test
2 FAILED TESTS
Error regular expression found in output. Regex=[ FAILED ]

So there seems to be some problems with shuffles and device_global variables. It's hard to look into it without access to that architecture, though.

@pvelesko
Copy link
Author

So there seems to be some problems with shuffles and device_global variables. It's hard to look into it without access to that architecture, though.

It's my personal server, I can set you up with ssh if you'd like.

Also, I ran these tests on an iGPU which is available on the same system:

85% tests passed, 8 tests failed out of 52

Total Test time (real) = 731.19 sec

The following tests FAILED:
	  1 - Kokkos_CoreUnitTest_Serial1 (Failed)
	  4 - Kokkos_CoreUnitTest_SYCL1A (Subprocess aborted)
	  6 - Kokkos_CoreUnitTest_SYCL2A (Failed)
	 10 - Kokkos_CoreUnitTest_SYCL3 (Failed)
	 14 - Kokkos_CoreUnitTest_Default (Timeout)
	 29 - Kokkos_CoreUnitTest_DeviceAndThreads (Failed)
	 38 - Kokkos_AlgorithmsUnitTest_StdSet_E (Subprocess aborted)
	 47 - Kokkos_AlgorithmsUnitTest_StdSet_Team_I (Subprocess aborted)

@pvelesko
Copy link
Author

@ajpowelsnl

Kokkos 4.1.00 + oneapi/2023.2.4 on Intel(R) UHD Graphics 770

75% tests passed, 10 tests failed out of 40

Total Test time (real) = 619.60 sec

The following tests FAILED:
          4 - Kokkos_CoreUnitTest_SYCL1A (Failed)
          6 - Kokkos_CoreUnitTest_SYCL2A (Timeout)
         10 - Kokkos_CoreUnitTest_SYCL3 (Failed)
         11 - Kokkos_CoreUnitTest_SYCLInterOpInit (Failed)
         12 - Kokkos_CoreUnitTest_SYCLInterOpInit_Context (Failed)
         13 - Kokkos_CoreUnitTest_SYCLInterOpStreams (Failed)
         14 - Kokkos_CoreUnitTest_Default (Timeout)
         32 - Kokkos_ContainersUnitTest_SYCL (Timeout)
         33 - Kokkos_UnitTest_Sort (Timeout)
         39 - Kokkos_AlgorithmsUnitTest_StdSet_E (Subprocess aborted)

@pvelesko
Copy link
Author

Does anyone need access to the machine for debugging?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backend - SYCL Question For Kokkos internal and external contributors and users
Projects
None yet
Development

No branches or pull requests

3 participants