Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differing scaling factors across the same benchmark misrepresented in comaprisons. #236

Open
loonatick-src opened this issue Mar 19, 2024 · 3 comments

Comments

@loonatick-src
Copy link

  • OS: macOS 14.4 Sonoma
  • Swift version: swift 5.11-dev DEVELOPMENT-SNAPSHOT-2023-12-07-a
  • package-benchmark version: 1.22.3
    I refactored a benchmark by adding a non-unit scaling factor to reduce variance, i.e. from this
Benchmark("my benchmark") { benchmark in
      ...
}

to

Benchmark("my benchmark", configuration: .init(scalingFactor: .kilo)) { benchmark in
    for _ in benchmark.scaledIterations {
        ...
    }
}

This shows up like so when using swift package benchmark baseline compare

╒══════════════════════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│         Time (wall clock) (μs) *         │      p0 │     p25 │     p50 │     p75 │     p90 │     p99 │    p100 │ Samples │
╞══════════════════════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│                   prev                   │     487 │     522 │     526 │     530 │     539 │     565 │     667 │   10000 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│                 current                  │  514414 │  516686 │  518259 │  519045 │  521929 │  543344 │  543344 │      20 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│                    Δ                     │  513927 │  516164 │  517733 │  518515 │  521390 │  542779 │  542677 │   -9980 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│              Improvement %               │ -105529 │  -98882 │  -98428 │  -97833 │  -96733 │  -96067 │  -81361 │   -9980 │
╘══════════════════════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛

╒══════════════════════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│             Malloc (total) *             │      p0 │     p25 │     p50 │     p75 │     p90 │     p99 │    p100 │ Samples │
╞══════════════════════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│                   prev                   │       0 │       0 │       0 │       0 │       0 │       0 │       0 │   10000 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│                 current                  │       0 │       0 │       0 │       0 │       0 │       0 │       0 │      20 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│                    Δ                     │       0 │       0 │       0 │       0 │       0 │       0 │       0 │   -9980 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│              Improvement %               │       0 │       0 │       0 │       0 │       0 │       0 │       0 │   -9980 │
╘══════════════════════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛

It appears that the new benchmark is not scaled and this caused some confusion. Is this the intended behavior?

@hassila
Copy link
Contributor

hassila commented Mar 27, 2024

Thanks for the report, unfortunately away for a couple of weeks and won't have time to check on this until then, sorry for delay.

@hassila
Copy link
Contributor

hassila commented Apr 20, 2024

Had a Quick Look at this now, this is not expected - will have a look at it, in the meantime generating a new baseline with the new factor should make things work.

@hassila
Copy link
Contributor

hassila commented Apr 20, 2024

Note to self:
result.timeUnits = base.timeUnits is invalid, we should instead scale the percentile results to base when they differ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants