{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":15671179,"defaultBranch":"dev","name":"mallocMC","ownerLogin":"alpaka-group","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2014-01-06T10:21:36.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/62020949?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1613651741.402764","currentOid":""},"activityList":{"items":[{"before":"c82b95f46c4f1bf6375f760bd5e46d22a0d94585","after":"bffe2aa2da5e83d356ff8d32f392935b8f7a59fa","ref":"refs/heads/dev","pushedAt":"2023-09-26T00:43:57.000Z","pushType":"pr_merge","commitsCount":2,"pusher":{"login":"ax3l","name":"Axel Huebl","path":"/ax3l","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1353258?s=80&v=4"},"commit":{"message":"Merge pull request #243 from ax3l/doc-fix-scatteralloc-paper\n\nREADME: Fix ScatterAlloc Paper Link","shortMessageHtmlLink":"Merge pull request #243 from ax3l/doc-fix-scatteralloc-paper"}},{"before":"6ac89ac5029be6a4d335bd392e95a6e85d3a5a62","after":"c82b95f46c4f1bf6375f760bd5e46d22a0d94585","ref":"refs/heads/dev","pushedAt":"2023-08-22T08:46:43.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"psychocoderHPC","name":"René Widera","path":"/psychocoderHPC","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4037125?s=80&v=4"},"commit":{"message":"update to latest alpaka 1.0.0-dev (#242)\n\n* apply compatibility with alpakas interface change\r\n\r\nalpaka renamed `Pltf` into `Platform` and make platform an object.\r\n\r\n- Additionally this commit fixes the examples which are broken because\r\n changes from #225 was not taken into account.\r\n\r\n* Squashed 'alpaka/' changes from 8b17dfe492..76c6bba28c\r\n\r\n76c6bba28c Reduce example work sizes (#2084)\r\n516e9f9b2e Add alpaka_RELOCATABLE_DEVICE_CODE option (#1467)\r\n79c3113a98 fix CUDA CMake support\r\nbb74c9129e Disable recursive macro expansion warning for icpx\r\na6f9e6e053 Reduce console output\r\n93da137545 Simplify alpaka_add_{executable,library}\r\nb539d47d8d remove atomics from MemFence\r\n23edf577e5 Move Gitlab clang jobs to Ubuntu 22.04\r\neeb7bdce9f Move ASan CI job to clang-16\r\n44d1109168 Update README.md and temporarily disable SYCL runtime checks\r\n78d436c228 Move m_extentWidthBytes outside of debug guards\r\ne7ebee94a4 Add oneAPI CI jobs\r\n7aa8b043ca Ensure CallbackThread/ThreadPool propagate exceptions\r\n575f64dfaf Test value categories for enqueued tasks\r\n12980a9865 Add missing \r\n7aeafa5269 Remove remainders of Accessor\r\nb23e3cf4e0 change formatting for clang-format\r\n2edf839a23 change fixed size_t to auto\r\nca69fc644b include feedback\r\n73e46fcb3c change some const variables to constexpr\r\neffde28370 Remove ALPAKA_SYCL_BACKEND_ONEAPI\r\n0222a7aecd apply reviewer comemnts\r\ne0bc8cbb62 event test missing checks\r\n813c970bcb add new event tests\r\ne4ee1e0e21 fix host thread event implementation and evenet tests\r\n6c442c71cb fix accessors for the SYCL backend\r\n74c320e7c6 Add a CI run with UBSan\r\nede19d7b9e add mdspan tests to the CI\r\ne793c4ef3a Add a test that a task is destroyed after execution\r\n388483ce6b Modernize CMake\r\n062a9feda4 Fix missing include\r\nac7b41daf2 Mark mdspan includes as SYSTEM includes\r\nfcec7c2fc7 Fix compilation of MdSpan tests\r\nb6eb4b62ee Drop Accessor\r\n727f55b71b Update special CUDA jobs\r\n53e17b8aa9 Rewrite counterBasedRng example using mdspan\r\nd62dd59bfa Add math::copysign\r\n558d2698cd add CUDA 12.2\r\n9388d8f249 Fix compilation of bufferCopy example\r\nd8a41f26c2 ci job generator: print warning if parameter value is not supported by the alpaka-job-coverage library\r\na0d731d43a Remove unused variable\r\nb524591014 Add CUDA/HIP headers\r\ne2a994ebae Forward declare AccGpuUniformCudaHipRt to avoid a dependency loop\r\n49e90324aa Add the alpaka_DISABLE_VENDOR_RNG option\r\n2b265c01fa Make the vendor-specific random number generators optional\r\nf32efc2664 Add missing include\r\n2ed16fbf58 GitLab CI: job generator checks if container images exists\r\n7ebf53fab7 Enable release builds for gcc-9 + CUDA\r\n8e5ae6e749 remove alpaka_SYCL_ENABLE_IOSTREAM from cmake\r\n67ef8a736f Fix a dangling pointer in the SYCL memory buffer deleter\r\n16e32caad3 Remove the dependency on Intel MKL\r\n4f787e1ca3 Rename FenceTest to fenceTest\r\n99e5b04fad Refactor OpenMP2 collective queue\r\nb322395974 Add clang-16 test runner (#1971)\r\nfa0af94515 modified allocMappedBuf in the tests\r\nb046d0d0c0 add the platform as an argument to allocMappedBuf\r\nf117eb1a0c Add missing ALPAKA_UNIFORM_CUDA_HIP_RT_CHECK calls in debug mode (#2034)\r\nee1b7c4b3a update SYCL README\r\n6002d3ebfb Complete renaming Pltf to Platform\r\neba6db5d8e Add one more digit to the logs\r\nea0120f07e Implement math tests for ternary ops and fma\r\n308be5dce9 Add the fused multiply-add functions\r\n255c5d1fe6 documentation: QueueCpuOmp2Collective\r\n459d32612b add missing ADL tests for math hyperbolic functions\r\nab4eb3fee6 fix callback thread task lifetime\r\n164edf14f6 Trivial clean up of some SYCL-related headers\r\ndfdca84d33 add math::log2 and math::log10\r\n8cf861bd6a Rename Pltf to Platform\r\n5251061e13 CI: enable Clang 15 as CUDA compiler for release builds\r\nb0fbddf9da fix CallBack thread data race\r\nb5d541b4d3 Update the main SYCL include file name\r\n56cd5cdc78 Various fixes related to the SYCL back-end\r\nac0143dbdd Support compile-time warp size in SYCL kernels\r\ncffed4c8dc First draft adding the warp size as a kernel trait\r\nd9fbf7bf36 Rewrite the SYCL memcpy and memset operations\r\nc1fe0763f1 Generalise the SYCL CpuSelector to non-Intel CPUs\r\n24ca7fdfd7 Rewrite the SYCL backend for the SYCL 2020 standard and USM allocations (part 4)\r\n09e65a28c9 Rewrite the SYCL backend for the SYCL 2020 standard and USM allocations (part 3)\r\n76a13a774f Rewrite the SYCL backend for the SYCL 2020 standard and USM allocations (part 2)\r\n251482ede2 Rewrite the SYCL backend for the SYCL 2020 standard and USM allocations (part 1)\r\n467384a5ca queue test: fix catch2 usages within threads\r\n087009956a Fix the delagating constructor in KernelExecutionFixture\r\nfa7ce64499 Fix compilation errors in PltfGenericSycl\r\n41e99568f1 Make alpaka platforms full objects\r\n1eadba4f27 Fix typos in the comments in alpaka/vec/Vec (#2019)\r\nbfc35edcdb Remove conflicting entry from .clang-format (#2018)\r\na35e49ee27 Update the Any test to work with a sub-group size of 4\r\n6281b5f106 update alpaka-job-matrix-library to 1.3.5\r\n007f1ee632 Update the separable compilation test\r\n8490bb7a38 Mark mysqrt as ALPAKA_FN_EXTERN\r\nabb32ddca3 Remove the requirement that the native handle is an int\r\neab49eefbe Update the bufSlicingTest\r\n6abab49170 Update the type expected by View::operator[]\r\n334e26586f Implement logical operations on alpaka::Vec\r\n9cbf890ae4 Update .readthedocs.yml to version 2\r\n2f989fe7ca Do not run tests on 0-dimensional accelerators\r\n64f3f35091 Move NonZeroTestDims to TestDims.hpp\r\nff04bf3d99 Silence clang-16 warnings\r\n909613a05e Enable two phase lookup with MSVC\r\n29f6ed2b83 Simplify ConcurrentExecPool and rename to ThreadPool\r\n5cbd95bb27 fix host callback unit test\r\n4d378772c5 Refactor TaskKernelCpuThreads\r\n3923c0828a Refactor ndloop\r\n49e6f2ce83 Use a nested namespace specifier and struct\r\nc41e56b38b Remove cleanup actions from CPU device\r\nda34256c83 Remove detaching logic from ConcurrentExecPool\r\ne0577b4087 Replace ConcurrentExecPool by CallbackThread in QueueGenericThreadsNonBlockingImpl\r\n3158f18ee2 Add a benchmark for enqueue of a host func\r\n375f4f0e16 Fix compilation with TSAN and serial backend\r\n8fe1d1c8dc Use nested namespace specifier\r\n37add41746 Avoid unnecessary copy\r\naba05a272c Include missing header\r\nc1e0d3b6dc Demangle the kernel names in the integration tests\r\n78e984d463 SYCL: update to the SYCL 2020 standard\r\nf92989616f SYCL: revert spurious changes\r\ndd9e30c67a Fix wrong CallbackThread termination\r\n620ba96104 Fix a typo\r\n5957371fd6 Make ALPAKA_UNIFORM_CUDA_HIP_RT_CHECK_IGNORE multiline safe\r\n9ece8221a4 Improve debug output in QueueTest\r\n625c7acb79 Allow CallbackThread to take any callable type\r\n50cc03fb64 Update copyright information\r\n6e521624c3 SYCL: change the default stream size to 8 KiB\r\n0ce95ec66c SYCL: update to the SYCL 2020 standard\r\nd7ce88dbc7 Make unused arguments anonymous\r\n12392a4d7d Fix CMake build instructions for SYCL back-end.\r\n2caf919872 Disable CATCH_CONFIG_FAST_COMPILE\r\n08e1037f79 Restrict atomic tests to the supported types\r\nedd1eb7883 Do not run tests on 0-dimensional accelerators\r\ncb470230d4 Add a meta type to select non-zero integral constants\r\n8de720070e fix callback-thread tasks lifetime\r\nef23ccd184 Add Xcode 14.3.1 test runner\r\n3fe070aba0 Update SYCL CMake and remove Xilinx support\r\nd7d873e96c Try to fix amalgamation CI\r\n2346940a13 Update CMake / Boost minor versions and copyright info (#1969)\r\nd6eb7146d4 Add gcc-13 test runner\r\n3838fbcd16 Fix ill-formed spelling of ctor in C++20\r\n20cd62e9f5 Fix amalgamation CI job\r\n46cdf9bfe2 Refactoring\r\nb1b5fc9956 Add CI job to create amalgamated alpaka.hpp\r\n2af4dcc210 Use quotes for including local alpaka headers\r\n9a128adb5a CI: test ROCm 5.5\r\n56c12da983 Update Catch2 version requirement\r\n99e131d121 Update to Catch2 v3.3.2\r\n59cb5ebca1 Update Catch2 version requirement\r\n0c27664de8 Update to Catch2 v3.3.1\r\n0b151d0378 Disable MSVC + CUDA jobs\r\n786ce2c3b1 Add CUDA 12.1 support (#1957)\r\n0b25fc9e7e GitLab CI: enable nvcc 12.x c++20 test\r\n461e1017e4 GitLab CI: support alpaka-job-coverage 1.3.0\r\n88860c99cc fixed alpine image for Gitlab CI job generator\r\n89a411fea6 Remove OpenMP 5 back-end\r\n1983489609 Remove OpenACC back-end\r\n07a8458ed9 Update documentation for kernel arguments\r\n11a6ac1342 Deactivate icpx omp5 job\r\na9f5b59da0 Removed ext::oneapi namespace\r\nbd515b89d8 Removed experimental namespace from SYCL\r\n6c3ab90687 CI: fix compile issue if cpuonly job is executed on a GPU runner (#1939)\r\n90dc85db96 Avoid repeated writing to shared mem in BabelStream dot\r\n10c7c4c77f Reserve the devices vector memory before filling it\r\n193f2c4be3 CI: fix job generator\r\naf21e943d3 HIP <=5.3 avoid compiler error\r\n8ea325d31e Restore macOS OpenMP jobs (#1922)\r\n2346ca6a51 Consider CUDA 11.7 stream memory operations\r\na2a9695e96 Add ROCm 5.4 support (#1915)\r\n1d8772cc14 Update CI container to version 3.1\r\n759d754577 Use SPDX License identifiers\r\nb849ce43db Replace BOOST_LANG_HIP with HIP_VERSION\r\nb5b2d00475 Fix compilation warnings\r\necc06294a0 Always inline with ALPAKA_FN_INLINE\r\n162773e2a7 Remove old clang-cuda workaround\r\n74ee8c8c11 Raise required CMake version to 3.22\r\n5573fc351c Disable tests if used by add_subdirectory\r\na68c866cc6 GitLab CI: split compile only and runtime test in two child pipelines\r\n40140fa153 add CUDA to job generator\r\na36139a041 fix Clang installation in CI\r\n100a047e7e make GitLab CI jobs interruptible\r\n9716c5673d Make SYCL runtime objects static\r\n60ede546fe Update CMake and Boost point releases (#1903)\r\n2e99977245 Implement trait constants\r\n8ef9ccb3a1 fix agc-manager detection in the CI\r\nda31798980 Enable more C++20 jobs\r\n9946b859af Manually install sanitizer libraries\r\n7a75cb3eaf Update to Xcode 14.2\r\n28338849bd Use hipMallocAsync/hipFreeAsync with HIP 5.2.0 and later\r\n921a6bf8bb Use hipLaunchHostFunc with HIP 5.4.0 and later\r\nd44d2ab572 Add clang-15 to CI\r\n4e6fecb3d7 CUDA CI update\r\n6674ce6bab add HIP to the CI generator parameter\r\ne28a9f34c3 Make use of mdspan configurable in CI\r\n242ea84aad Rewrite bufferCopy example using mdspan\r\n0aadfedd73 Add customizable function getMdSpan/getTransposedMdSpan(View)\r\ne5107da242 Add mdspan to cmake\r\n4363bbb912 Drop MSVC 2019\r\n97ea65a63d remove support for HIP 4.x\r\n24d8a3ee11 use agc-image for GitLab CI\r\nf79f08a235 add support for agc-manager\r\ne02b42624f fix icpx error implicit conversion\r\n89c93d1075 Drop legacy compilers and CUDA versions\r\n7582d6c123 Fix undefined constant with nvcc 12.0\r\n76fb556517 Select serial accelerator for tests/examples (#1843)\r\n1c18024e33 gitlab CI run more jobs as compile only\r\n1c6ea20f46 Port babelstream from cupla to alpaka\r\n291cff54cd Run clang-format\r\n908ef12064 Add cupla version of BabelStream\r\ne31eed92ac remove boost 1.73 warning\r\n4a7c9db41e Mark CATCH cmake variables as advanced\r\na8460487c3 Add gcc-12 + OpenACC CI job\r\n8434b9ea79 Update to Catch2 v3.2.1\r\n7b77a28461 fix CUDA memory allocation mapped/async\r\na2f8d778a7 Collapse compiler matrix (#1860)\r\n4b50b39267 Add clang-14 to CI\r\n3308d8bbbd Avoid use-after-free of m_cvWakeup\r\nccb8683d7c Fix use after move in QueueGenericThreadsNonBlocking\r\n2c2588989d Refactor QueueGenericThreadsNonBlocking\r\na690c3d206 Refactor ConcurrentExecPool\r\n5f499f8c2f Refactor ITaskPkg and TaskPkg\r\n2bf0149dd4 Refactor ThreadSafeQueue\r\n19bed293a4 Merge ConcurrentExecPool primary template with specialization\r\n3c33af6542 Drop CUDA 9.2\r\n85abc80984 Drop Boost.fiber back-end\r\nb3be00fc07 fix CUDA CI\r\nef234bc98e Create a patch if clang-format CI fails (#1823)\r\nbde1dc6a6c Fix missing `final` keyword for acceleriator inheritance\r\nc95c9d0891 Test calling getValidWorkDiv with Idx type directly\r\n8fa8648389 Refactor subDivideGridElems\r\n4494a2c9b9 Fix createView for containers without a size argument\r\nd0d7c14253 Add a new example demonstrating parallel loop patterns\r\nfbecfb5e8e add math hyperbolic functions (#1828)\r\ndb2457997c CI: add HIP/ROCm 5.3\r\n6e7e50a1df Make BlockSyncTestKernel::gridThreadExtentPerDim constepr function\r\n162c330cfa Update CI NVHPC versions to 22.3\r\nbc3b863846 GetDevProps: report m_multiprocessorCount = 1\r\nc47bf10dd7 CI: change ROCm CI node (#1844)\r\n058785a838 Drop alpaka/time\r\ndf795ddc16 fix warning calling `__host__` from `__host__ __device__` function\r\nc9377d33fc CI: remove OMP2 backend tests for MACOSX\r\n12e0b302ef Run clang-format\r\neccab29627 Add some tests for subDivideGridElems\r\nb8ddf35c39 Run clang-format\r\ndcbf43aa9d Enable new formatting options\r\n96c3920cff Update to clang-format-14\r\nd3064f036f Upgrade to GH checkout action v3\r\n1111fd083c delete copy, assign, move and move assign operator for accelerators\r\nffb8307194 Update to Catch2 v3.1.1\r\nb004375c3d Implement accelerator tags\r\n08724b5f40 CI: test ROCm 5.2.3 (#1812)\r\nc73f8b7605 Apply suggestions from code review\r\nc235fd67f0 Add example counterBasedRng\r\n4695951762 CudaVectorArrayWrapper: Add convertability to/from std::array\r\n3fbfb08076 Add PhiloxStatelessVector\r\nf8ee7c9bf1 Add PhiloxStateless\r\nd98d7707a6 Make mangled CUDA kernel name as short as possible (#1795)\r\n6feb271d80 Rename result and reference values\r\ne824302444 Add tests for elementwise_min and max functions\r\n201f53f26d Add elementwise_min and max functions\r\n3742e88648 Workaround nvcc 11.3\r\n116a36712e Add deduction guide for Vec\r\nc1d6ace30c Upgrade clang/CUDA headercheck CI to clang 13 and CUDA 11.2\r\n0e112bd0ff add job generator to gitlab ci\r\nbca3bfbf60 Remove the functions to pin/unpin an existing buffer\r\n77b060355a Add a comment and a unit test that default engine type is trivially copyable\r\n28cc847ce4 Make Philox random engines trivially copyable\r\nb518e8c943 Document alpaka::allocAsyncBuf\r\nf4c0b639d4 Document alpaka::allocMappedBuf\r\n1f3babfec3 Improve error handling for memory de/allocation\r\n2f8c6b0423 Move allocAsyncBufIfSupported to mem/buf/Traits.hpp\r\n47e3278fb3 Update some tests to use allocMappedBufIfSupported\r\n9d7de18e2e Add a trait for pinned/mapped memory allocation capability\r\n31a847236d Change the interface of allocMappedBuf\r\nc4424f2a9d PhiloxBaseCommon,PhiloxConstants: constexpr workaround for GCC\r\n540397c429 Use CUDART_VERSION instead of BOOST_LANG_CUDA\r\n0d2cec0bc3 Add missing template parameters\r\n30d205f46d Update copyright notice\r\n6c990fe660 Apply code style and formtting\r\na72874556e Use a nested namespace definition\r\na3380a364b Query te OS for free pages instead of reading /proc/meminfo\r\n5afdda9869 Move includes to the global namespace\r\nd8c6e5f94c Drop support for icc/icpc\r\nd92f22850e fix clang CUDA atomics\r\nf27d78c23c HIP: use emulated `atomicAdd(float*,float)`\r\n31993fcbb0 HIP: workaround atomicMax and atomicMin\r\n071417f50b HIP: usa atomic load within atomicCas emulation\r\na96bc8c733 OpenACC: test only 32bit and 64bit atomics\r\ncab648ec0e disable OpenACC float comparisn warning\r\n543310c0ab workaround for clang 9 with cuda 9.2\r\n62db17a094 refactore atomic unit tests\r\nd1c34cde30 `alpaka::AtomicCas` add floating point support\r\n3d76d95222 refactor HIP/CUDA atomic implementations\r\n0b96515b1c HIP: use build-in `atomicAdd(double)`\r\n8acfbe42d6 Add gcc-12 to CI\r\nf5118e82e2 Simplify clang installation\r\n5a4691c826 Add ViewConst\r\n4b6ead16de Fix misleading parameter name in DefaultQueue\r\na310c437b0 Remove redudant check\r\nfbd1ac0c32 CI Update for ROCm\r\ne2958beb22 Remove ALPAKA_FN_HOST_ACC on defaulted functions\r\na82be7374e CI Update for macOS\r\n01a80e42bf OpenMP fixes for clang 13\r\n6754e5bf7c OpenACC fixes for gcc12\r\ndd3352be8f Set policy CMP0091 to NEW\r\ne76b69b16b Remove support for clang-5\r\nafb49a0c47 Accumulate memcpy/memset static_asserts\r\n815490192f Upgrade to Catch2 v3\r\n7ff5fdd478 Allow temporary destination views in memset/memcpy\r\n38c24f6c4e Refactor TaskCopyOacc\r\n5da464ff40 Diagnose CallbackThread joining itself\r\n\r\ngit-subtree-dir: alpaka\r\ngit-subtree-split: 76c6bba28c7a94a58b420e91ba135705f59cde44\r\n\r\n---------\r\n\r\nCo-authored-by: Third Party ","shortMessageHtmlLink":"update to latest alpaka 1.0.0-dev (#242)"}},{"before":"b16fa654c08c3f7e9a153e2102d54cb848e5c11b","after":"6ac89ac5029be6a4d335bd392e95a6e85d3a5a62","ref":"refs/heads/dev","pushedAt":"2023-07-17T08:25:56.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"psychocoderHPC","name":"René Widera","path":"/psychocoderHPC","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4037125?s=80&v=4"},"commit":{"message":"fix creation policy Scatter (#241)\n\nIt is possible that mallocMC is giving the same memory slot to different concurrent allocation calls.\r\nThe error shows up in very seldom cases in PIConGPU where many threads on CUDA Ampere GPUs allocated memory of different sizes at the same time.\r\n\r\nhttps://github.com/ComputationalRadiationPhysics/picongpu/issues/4576#issuecomment-1632575759\r\n\r\nThe data race happens because the last thread is emptying the PTE with `_page[page].init();`. Most likely the data race happened because we alter the bitmasks with atomics but reset the bitmask in init without. A second potential reason could be that we read the chunk size in one place without using atomic which is resulting into data races too.\r\n\r\nThis PR\r\n- removes reading chunk size without atomic\r\n- is changing the way how a PTE is freed\r\n- reduce branch divergence by refactoring the calls to `tryUsePage()`","shortMessageHtmlLink":"fix creation policy Scatter (#241)"}}],"hasNextPage":false,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAADiLBlKQA","startCursor":null,"endCursor":null}},"title":"Activity · alpaka-group/mallocMC"}