feat(profiling): keep string cache data alive longer #2668

morrisonlevi · 2024-05-20T22:22:36Z

Description

At the high-level, there are two wins here:

The strings are no longer allocated with the system allocator, which was demonstrated to be slightly more efficient for both memory and CPU in libdatadog.
The string set is no longer thrown away at the start of every request. Instead, we allow it to live to the next request if the total memory used is under some threshold.

Details

This uses the new allocators in libdatadog to build a StringSet, and drops the old string table from the code. Like the new StringTable in libdatadog v9, the string data for the StringSet is held in an arena. However, it doesn't track insert order, nor hand out ids. It returns references to interned data instead, like a regular set.

We store ThinStr<'a>s in the runtime cache, as usizes. This bit is unsafe because the lifetime of the cache doesn't fit nicely in the borrow checker's semantics. However, the implementation guarantees that the data lives long enough.

The run time cache is empty on each request, but the table data is still around. So we'll re-establish links to the cache to the data, but we don't have to rebuild the set structure as much compared to the previous implementation

If the allocators powering the string set report memory over 2 MiB at the end of a request, then the string set will start over. This particular number was chosen because it is the same size used for the chunk size of the chain allocator powering the set. The idea is that if we have more than one chunk, then maybe we're too big. We also don't want to look like a memory leak, and sizes over 2 MiB may start to look like we're leaking memory as we slowly use more and more.

Reviewer checklist

Test coverage seems ok.
Appropriate labels assigned.

pr-commenter · 2024-05-20T22:58:05Z

Benchmarks

Benchmark execution time: 2024-06-12 14:33:43

Comparing candidate commit c72a081 in PR branch levi/cache-longer with baseline commit 4bbc100 in branch master.

Found 1 performance improvements and 0 performance regressions! Performance is the same for 26 metrics, 9 unstable metrics.

scenario:walk_stack/1

🟩 wall_time [-469.453ns; -466.840ns] or [-3.738%; -3.718%]

We still have to re-establish the link in the run time cache, but the string set itself will keep the sets and data alive.

morrisonlevi · 2024-05-28T17:14:49Z

Now that the copy/paste issue and the unrelated sigsegvs have been fixed, this is looking much better. I plan to do some manual testing to be more confident, but I'm marking this as ready for review.

We should probably release a libdatadog version before merge, though, and use that rather than the git revision hash.

profiling/src/profiling/stack_walking.rs

A slow ramp up to 4 MiB could _look_ like a memory leak. However, a slow ramp to 2 MiB is probably going to look like it's within normal operating ranges.

morrisonlevi requested review from a team as code owners May 20, 2024 22:22

morrisonlevi marked this pull request as draft May 20, 2024 22:22

morrisonlevi changed the title ~~feat: keep string cache data alive longer~~ feat(profiling): keep string cache data alive longer May 20, 2024

github-actions bot added the profiling Relates to the Continuous Profiler label May 20, 2024

morrisonlevi added 3 commits May 28, 2024 10:13

feat: keep string cache data alive longer

d6143f5

We still have to re-establish the link in the run time cache, but the string set itself will keep the sets and data alive.

refactor: use ArenaAllocator trait

77f3f5b

test: search harder for out-of-bounds writes

5d6e76c

morrisonlevi force-pushed the levi/cache-longer branch from 30f0e72 to 5d6e76c Compare May 28, 2024 16:13

morrisonlevi marked this pull request as ready for review May 28, 2024 17:14

morrisonlevi mentioned this pull request May 31, 2024

perf(profiling): use an arena-based string table #2511

Closed

2 tasks

Merge branch 'master' into levi/cache-longer

a34b023

realFlowControl reviewed Jun 4, 2024

View reviewed changes

profiling/src/profiling/stack_walking.rs Outdated Show resolved Hide resolved

morrisonlevi and others added 3 commits June 4, 2024 10:34

add debug log when resetting cache

f628fd4

also add trace log for when cache doesn't reset

dba5cc4

upgrade to libdatadog v10.0.0

b1ee71e

realFlowControl force-pushed the levi/cache-longer branch from 492c6e8 to b1ee71e Compare June 6, 2024 08:38

morrisonlevi requested review from a team as code owners June 10, 2024 17:27

Merge branch 'master' into levi/cache-longer

8b968fb

morrisonlevi force-pushed the levi/cache-longer branch from f1a2226 to 8b968fb Compare June 10, 2024 17:38

morrisonlevi added 4 commits June 10, 2024 15:44

refactor: extract StringCache, document safety

4a035fe

Merge branch 'master' into levi/cache-longer

571006c

refactor: extract ThinStr to its own module

235911f

refactor: extract ThinPtr

3c9d2da

realFlowControl approved these changes Jun 11, 2024

View reviewed changes

fix: import visibility and clippy lint

c247a4b

fix: dead code on PHP 7

c7a7d7c

morrisonlevi force-pushed the levi/cache-longer branch from b11496c to c7a7d7c Compare June 11, 2024 23:33

shrink string cache reset threshold a bit

c72a081

A slow ramp up to 4 MiB could _look_ like a memory leak. However, a slow ramp to 2 MiB is probably going to look like it's within normal operating ranges.

morrisonlevi merged commit 9f4a6a5 into master Jun 12, 2024
583 of 588 checks passed

morrisonlevi deleted the levi/cache-longer branch June 12, 2024 23:11

github-actions bot added this to the 1.2.0 milestone Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(profiling): keep string cache data alive longer #2668

feat(profiling): keep string cache data alive longer #2668

morrisonlevi commented May 20, 2024 •

edited

Loading

pr-commenter bot commented May 20, 2024 •

edited

Loading

morrisonlevi commented May 28, 2024 •

edited

Loading

feat(profiling): keep string cache data alive longer #2668

feat(profiling): keep string cache data alive longer #2668

Conversation

morrisonlevi commented May 20, 2024 • edited Loading

Description

Details

Reviewer checklist

pr-commenter bot commented May 20, 2024 • edited Loading

Benchmarks

scenario:walk_stack/1

morrisonlevi commented May 28, 2024 • edited Loading

morrisonlevi commented May 20, 2024 •

edited

Loading

pr-commenter bot commented May 20, 2024 •

edited

Loading

morrisonlevi commented May 28, 2024 •

edited

Loading