Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization benchmarks. #808

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Conversation

arsh
Copy link
Contributor

@arsh arsh commented Mar 10, 2024

Description of change

  1. Using Criterion benchmarks, I want to show how we see a performance gain (~33% faster) by switching from bincode to bincode2 when reading the cached blocks from disk.
  2. I want to get feedback on changes made to DiskDataCache to abstract out the serialization library. For example, am I doing the right abstraction? Am I using the trait boundaries in the right way? So, mostly on the Rust-side of things.

Benchmarks results

In summary, using bincode2 gets a better performance than bincode and closer to the base line of reading the contents of a file to memory (read_file in the graphs). From flamegraphs, we noticed that bincode was zeroing the vectors in its fill_buffer implementation. bincode2 has a different implementation which is more efficient.

Reading a file (base line)

Screenshot 2024-03-10 at 10 19 56 AM

Reading cache through bincode

Screenshot 2024-03-10 at 10 20 10 AM

Reading cache through bincode2

Screenshot 2024-03-10 at 10 20 23 AM

Relevant issues: N/A

Does this change impact existing behavior?

No


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

Signed-off-by: Andres Santana <[email protected]>
@arsh arsh temporarily deployed to PR integration tests March 10, 2024 10:37 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 10, 2024 10:37 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 10, 2024 10:37 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 10, 2024 10:37 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 10, 2024 10:37 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 10, 2024 10:37 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 10, 2024 10:37 — with GitHub Actions Inactive
Signed-off-by: Andres Santana <[email protected]>
@arsh arsh temporarily deployed to PR integration tests March 10, 2024 10:52 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 10, 2024 10:52 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 10, 2024 10:52 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 10, 2024 10:52 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 10, 2024 10:52 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 10, 2024 10:52 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 10, 2024 10:52 — with GitHub Actions Inactive
Copy link
Contributor

@passaro passaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I'll add a few specific comments, but first:

I want to get feedback on changes made to DiskDataCache to abstract out the serialization library

Do we need to abstract the serialization library? Any reason not to just replace it?

@arsh
Copy link
Contributor Author

arsh commented Mar 11, 2024

This is great! I'll add a few specific comments, but first:

I want to get feedback on changes made to DiskDataCache to abstract out the serialization library

Do we need to abstract the serialization library? Any reason not to just replace it?

I wanted to test them at the same time and needed ability to configure it and thus this abstraction. We don't necessarily need it. However, we could be in the same situation in the future where we want to test with a new library and it would be nice to have this in place to just plug the new one in.

@jamesbornholt
Copy link
Member

This is cool, but bincode2 hasn't been updated for 4 years and looks unmaintained, so not sure we should be switching to it.

Comment on lines 18 to +19
bincode = "1.3.3"
bincode2 = "2.0.1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that we have to take a non-dev-dependency on "all the versions we want to benchmark" here. I'd rather put it behind a feature flag, or just not commit the versions we're not using.

@arsh
Copy link
Contributor Author

arsh commented Mar 13, 2024

This is cool, but bincode2 hasn't been updated for 4 years and looks unmaintained, so not sure we should be switching to it.

bincode 1.x is in a similar boat. The version we use, 1.3.3, was released 3 years ago. After that, bincode switched to 2.x. These 2.x versions are marked with rc which I presume means release candidate and not production ready? Since 2.x claims to be a re-write, I did the same benchmarks using it and did not see the improvements we got from bincode2.

One way forward is to switch to bincode = 2.x, do perf analysis to determine why its performance isn't as good as bincode2, and potentially submit a PR to fix it.

More importantly, I was hoping this micro-optimization would help with the overall cache performance as described in #719 but when I tested it following what is described there, I did not see an improvement, which looks like something else (to be determined) is the bottleneck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants