Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using the standard benchmark harness instead of criterion #264

Open
plafer opened this issue Mar 23, 2024 · 1 comment
Open

Consider using the standard benchmark harness instead of criterion #264

plafer opened this issue Mar 23, 2024 · 1 comment

Comments

@plafer
Copy link
Contributor

plafer commented Mar 23, 2024

We currently use criterion for benchmarking. The main limitation of criterion is that only a crate's public API can be tested.

In contrast, the standard benchmark harness (provided by the unstable test crate) allows you to mark any function with #[bench] (as you would a #[test]) - allowing you to benchmark internal functions. I believe this would be so valuable that it would outweigh the cons of using test. While working on #247, we were interested in knowing the specific timing of internal functions (e.g. here). I ended up manually inserting timing statements in the code, running the right test, and inspecting the output. IMO, this shows how limiting using criterion really is.

Note that criterion has a feature to allow benchmarking internal functions similar to #[bench], which depends on the custom_test_framework nightly feature . However, the tracking issue for custom_test_framework has been closed due to inactivity. So I personally would stay away from it if/until that changes.

Pros of using test over criterion

  • Ability to benchmark internal functions

Cons of using test over criterion

  • Less sophisticated reporting
    • e.g. criterion gives you the performance increase over the last run of the benchmark
  • No generated html reports
  • Probably less reliable
    • e.g. criterion pre-populates the cache before running a benchmark (not sure whether or not test does that too, but at least it's not advertised in the benchmark output)
  • test is still unstable

Summary

Although there are more cons than pros, I believe the ability to benchmark internal functions far outweighs any of the cons (as explained earlier). We can deal with the dependency on nightly by adding a "benchmarking" feature, which controls the use of nightly.

#![cfg_attr(feature = "benchmarking", feature(test))]

#[cfg(feature = "benchmarking")]
extern crate test;

fn internal_function() { ... }

#[cfg(test)]
#[cfg(feature = "benchmarking")]
mod bench {
    use super::*;
    use test::Bencher;

    #[bench]
    fn bench_internal_function(b: &mut Bencher) {
        b.iter(|| internal_function()));
    }
}
$ cargo +nightly bench --features=benchmarking bench_internal_function
@one-d-wide
Copy link

one-d-wide commented Apr 6, 2024

Hi. I don't familiar with winterfell at all, but I was having the same problem and found this issue on google.

I don't know whether my (rather wacky) solution would help there, but I managed to run a criterion benchmark with stable rust on a private attributes concealing it as a default #[test] like that:

fn internal_function() { ... }

#[cfg(test)]
pub mod tests {
    use super::*;
    use criterion::{black_box, Criterion};
    use std::time::Duration;

    fn benchmarks(c: &mut Criterion) {
        c.bench_function("bench_internal_function", |b| {
            b.iter(|| internal_function())
        });
    }

    #[test]
    fn bench() {
        // `cargo test --profile bench -j1 -- --nocapture bench -- <benchmark_filter>
        // This workaround allows benchmarking private interfaces with `criterion` in stable rust.

        // Collect cli arguments
        let args: Vec<String> = std::env::args().collect();

        // Interpret sequence of args `[ "...bench", "--", "[filter]" ]` as a trigger and extract `filter`
        let filter = args
            .windows(3)
            .filter(|p| p.len() >= 2 && p[0].ends_with("bench") && p[1] == "--")
            .map(|s| s.get(2).unwrap_or(&"".to_string()).clone())
            .next();

        // Return successfully if trigger wasn't found 
        let filter = match filter {
            None => return,
            Some(f) => f,
        };

        let profile_time = args
            .windows(2)
            .filter(|p| p.len() == 2 && p[0] == "--profile-time")
            .map(|s| s[1].as_str())
            .next();

        // TODO: Adopt `Criterion::configure_from` when it lands upstream
        let mut c = Criterion::default()
            .with_output_color(true)
            .without_plots()
            .with_filter(&filter)
            .warm_up_time(Duration::from_secs_f32(0.5))
            .measurement_time(Duration::from_secs_f32(0.5))
            .profile_time(profile_time.map(|s| Duration::from_secs_f32(s.parse().unwrap())));

        benchmarks(&mut c);

        Criterion::default().final_summary();
        // TODO: Move this function to a macro
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants