Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make run output the raw numbers #2

Open
Llorx opened this issue Aug 17, 2023 · 13 comments
Open

Make run output the raw numbers #2

Llorx opened this issue Aug 17, 2023 · 13 comments

Comments

@Llorx
Copy link
Owner

Llorx commented Aug 17, 2023

Run will output the raw numbers, and with built-in processors they can be shown in the console, a table, a file, or you can even write your own processor.

Idea: Just add a processor list to the IsoBench constructor (or by using methods) and they will handle events when one test sample ends, when one test ends completely and when all tests end completely, to show logs meanwhile the test is running and so on. This processor will follow an interface, which is easily typed with TypeScript.

@mindplay-dk
Copy link

Well, I tried...

main...mindplay-dk:iso-bench:collect-test-results

I tested this in the context of the benchmark, and it did work - but it takes way longer than the specified 2 seconds to actually measure 2 seconds worth of runs.

So this is somehow adding a lot of runtime overhead. I'm not sure why.

But I honestly don't fully understand this code. 🤔

@Llorx
Copy link
Owner Author

Llorx commented Aug 17, 2023

@mindplay-dk is because you changed the minMs for that, but you need to change the ms property.

The minMs property is the amount of milliseconds a single sample needs to run to take into account its output to the total ms, so if you set minMs as 2 seconds, it is going to re-run the samples, increasing the amount of cycles in each sample, until one single sample reaches the 2 seconds and is not discarded.

What you want is to run multiple small samples and get the average, so if you want 2 seconds, just set ms to 2000ms. By default minMs is set to 100ms, so it means that is going to gather a minimum of 100ms on each sample until it reaches 2000ms, which is about 20 samples.

In short, ms is the verry same as maxTime in benny, and minMs is the very same as minTime in benny:

/**
 * The maximum time a benchmark is allowed to run before finishing (secs).
 *
 * Note: Cycle delays aren't counted toward the maximum time.
 *
 * @default 5
 */
maxTime: number
/**
 * The time needed to reduce the percent uncertainty of measurement to 1% (secs).
 *
 * @default 0
 */
minTime: number

benny will increase the cycles of each sample until they reach the minTime, discarding all other samples, and will run samples until they reach the maxTime. benny also has the minSamples option that iso-bench doesn't have. The same for benchmark.js (also named minTime and maxTime. I guess I have to rethink my option names xD). They are common options in benchmark suites.

PS: I'm gonna clean the code. It was made yesterday afternoon in a couple hours just to see that everything works as expected.

@mindplay-dk
Copy link

I see.

Shouldn't minMs just be an implementation detail with a useful default maybe?

When is it useful to have control of this value?

@mindplay-dk
Copy link

On that note, I was confused by the "samples" value output in the console as well - why would you need to know how many cycles the framework used internally to collect the results?

I've been trying some different math on the collected values - not sure if it makes sense to include any or some of this in the library itself? You can see an example here:

image

For a benchmark like the one I'm trying to fix, probably the most reliable and meaningful value is the min, which also happens to be the simplest to calculate.

The mean is also simple enough, but quite unreliable, as there are always outliers skewing the result.

The typical value is a little "invention" of my own - I tried a whole bunch of different algorithms to see which ones give the most stable results between runs. It's fairly complex and a bit magical. It might make more sense to include a simple distance-weighted mean, which works almost as well.

Of course, the point of exposing the individual values was so you didn't need to own this code, but it might be nice if something a bit more immediately useful came out of it.

What do you think?

@Llorx
Copy link
Owner Author

Llorx commented Aug 18, 2023

I see.

Shouldn't minMs just be an implementation detail with a useful default maybe?

When is it useful to have control of this value?

It had a 100ms default value, which worked well for all my tests. The best way to run IsoBench is to run it with default values.

Anyway, I published a new update with different options, with default values that return more consistent results.

On that note, I was confused by the "samples" value output in the console as well - why would you need to know how many cycles the framework used internally to collect the results?

I don't know. benchmark.js shows it so I also done so xD

I've been trying some different math on the collected values - not sure if it makes sense to include any or some of this in the library itself? You can see an example here:

image

For a benchmark like the one I'm trying to fix, probably the most reliable and meaningful value is the min, which also happens to be the simplest to calculate.

The mean is also simple enough, but quite unreliable, as there are always outliers skewing the result.

The typical value is a little "invention" of my own - I tried a whole bunch of different algorithms to see which ones give the most stable results between runs. It's fairly complex and a bit magical. It might make more sense to include a simple distance-weighted mean, which works almost as well.

Of course, the point of exposing the individual values was so you didn't need to own this code, but it might be nice if something a bit more immediately useful came out of it.

What do you think?

I'm not sure what those results mean :-(

min is the sample with less opeations per second?

I'm not an algorithms guy, so I went with the mean which is the one I can implement easily, and I think that is the one that benny and benchmark.js use. The thing is that they clean outputs depending on the percentaje difference between samples and IsoBench still doesn't. That's something I will add in the future because I think that this logic that they explain in the post is the one that can give the more stable results.

The new release also returns the test array in the run() method, but right now is kinda basic (just the information it uses to calculate the mean, although the last release by default runs only 1 sample). I'm going to update it over time to add more useful information.

EDIT: Cleaned the code and added a Processor system so benchmark output is easier to manage. This is in preparation to add events during the benchmarking process, and also to be able to output charts, tables, HTML and so on.

@mindplay-dk
Copy link

I'm not sure what those results mean :-(

ha, no, of course not, I didn't actually post the code! sorry 😅

well, the changes I had locally were only working with my branch that exposes the raw numbers, and you current version still doesn't do that.

but I can dump my functions here, in case any of it is useful to you.

here's what I was printing out:

  for (const { name, values } of tests) {
    console.log({
      name,
      min: minOf(values),
      mean: meanOf(values),
      typical: weightedMean(values, 10)
    })
  }

and here's all the math behind that:

function weightedMean(values: number[], factor = 10) {
  const buckets = sortIntoBuckets(values, 100)
    .filter((bucket) => bucket.values.length > 0)
    .map((bucket) => ({
      ...bucket,
      weightedAverage: distanceWeightedMean(bucket.values, factor)
    }))

  let totalWeight = 0
  let weightedSum = 0

  for (const bucket of buckets) {
    const weight = Math.pow(bucket.values.length, factor)

    totalWeight += weight

    weightedSum += weight * bucket.weightedAverage
  }

  return weightedSum / totalWeight
}

function meanOf(values: number[]): number {
  let sum = 0

  for (const value of values) {
    sum += value
  }

  return sum / values.length
}

function minOf(values: number[]): number {
  let min = Number.POSITIVE_INFINITY

  for (const value of values) {
    if (value < min) {
      min = value
    }
  }

  return min
}

function maxOf(values: number[]): number {
  let max = 0

  for (const value of values) {
    if (value > max) {
      max = value
    }
  }

  return max
}

type Bucket = { min: number; max: number; values: number[] }

function sortIntoBuckets(values: number[], numBuckets: number): Bucket[] {
  const min = minOf(values)
  const max = maxOf(values)

  const range = max - min

  const buckets: Bucket[] = []

  for (let i = 0; i < numBuckets; i++) {
    buckets.push({
      min: min + (range * i) / numBuckets,
      max: min + (range * (i + 1)) / numBuckets,
      values: []
    })
  }

  for (const value of values) {
    const index = Math.min(
      Math.floor(((value - min) / range) * numBuckets),
      numBuckets - 1 // note: max of bucket range is inclusive *only* for the last bucket
    )

    buckets[index].values.push(value)
  }

  return buckets
}

function distanceWeightedMean(values: number[], factor = 2): number {
  if (values.length === 0) {
    throw new Error('cannot computed weighted average of an empty list')
  }

  if (values.length === 1) {
    return values[0]
  }

  if (values.length === 2) {
    return (values[0] + values[1]) * 0.5
  }

  let sum = 0

  for (const value of values) {
    sum += value
  }

  const avg = sum / values.length

  const error = (value: number) => Math.abs(avg - value)

  let deviation = 0

  for (const value of values) {
    deviation = Math.max(error(value), deviation)
  }

  if (deviation === 0) {
    return values[0]
  }

  let totalWeight = 0
  let weightedSum = 0

  for (const value of values) {
    const weight = Math.pow(1 - error(value) / deviation, factor)

    totalWeight += weight

    weightedSum += weight * value
  }

  return weightedSum / totalWeight
}

as said, the min value really is the most stable and reliable, as it comes the closest to ignoring "less than ideal" conditions for every run.

but I guess, technically, that could depend on what you're benchmarking... which is why I was trying a bunch of different things.

the distanceWeightedMean is fairly simple and popular for a lot of things - Amazon uses it to average product ratings.

my weightedMean implementation isn't strictly only a weighted mean - it's actually sorting the results into "buckets", and then calculating a weighted mean, with weights according to the number of values that fall into those buckets.

I have no idea if this is better or worse than what benchmark.js is doing - it looks like hard core statistical stuff, so I'm out of my league... personally, I would favor something non-scientists have a shot and understanding... or if I had to use whatever that statistical algorithm is, I would find a third party package with a well tested implementation. I wouldn't suggest copy/pasting stuff from benchmark.js unless you really understand what you're copying - even if it works well, because you're going to own it then. 😅

but you do you of course. 👍

I'm having a bit of trouble keeping up with all the changes, so I will probably set this project on the back burner until you think you're finished. 🙂

@mindplay-dk
Copy link

oh, one little thing, since you can't merge my changes now - I do think you need to make a small change here:

iso-bench/src/Test.ts

Lines 155 to 161 in 1194a7a

private _getCallbackTime(cycles:number) {
const startTS = process.hrtime.bigint();
while(cycles-- > 0) {
this._callback();
}
return Number(process.hrtime.bigint() - startTS) / 1000000;
}

Something like this instead:

https://github.com/mindplay-dk/iso-bench/blob/7db25f71f794f683eecfacbee6a40ffb6e7fbf1d/src/index.ts#L203-L211

    private _runTest(test:Test, cycles:number): number[] {
        const values: number[] = [];
        while(cycles-- > 0) {
            const startTS = process.hrtime.bigint();
            test.callback();
            values.push(Number(process.hrtime.bigint() - startTS) / 1000000);
        }
        return values;
    }

You're measuring the time it takes for the loop itself to execute - while this may seem insignificant, the faster/smaller the function you're measuring, the more it'll skew the results.

@Llorx
Copy link
Owner Author

Llorx commented Aug 20, 2023

I'm having a bit of trouble keeping up with all the changes, so I will probably set this project on the back burner until you think you're finished. 🙂

Yeah sorry 😅 it is now getting closer to what I want to achieve. Just pushed a new update with all the new Processor logic, so it will be easier to implement new Processors or custom ones.

Thank you for all the code. When I finish with the Processor thingy I'm going to analyze all these algorithms, and maybe even add an option to calculate different ones (min, mean, "typical"), because what each person is trying to achieve in each case is different.

You're measuring the time it takes for the loop itself to execute - while this may seem insignificant, the faster/smaller the function you're measuring, the more it'll skew the results.

Good catch but this is intentional because of exactly that reason. For fast methods (for example a buffer.readuintUint8) only measuring each run adds a lot of skew because when you call process.hrtime.bigint(), NodeJS goes inside the native layer to gather the data, and that's more expensive than just the while(cycles-- > 0) { line. When the data is gathered from the native layer you already have your time and you can waste as much CPU cycles as you want, but while the call is "travelling" to the native layer the time is still ticking because the native layer still doesn't know that you asked for the time, and that "travel" is VERY expensive. Also, not every system can achieve nanosecond resolution, so in those systems the skew is going to be astronomical as it may return "this lasted 0 milliseconds, now 1 millisecond, now 0 milliseconds again...". By adding them up to a single loop that must last a minimum amount of milliseconds, you also reduce that skew in those systems.

You can test it by running this simple script:

const ITERATIONS = 10_000_000;
const values = [];
function expensiveFunction() {
    /asdasd/.test("dfghsdfghdfgjasdasddftghsdfhg");
}
function testEachRun() {
    let cycles = ITERATIONS;
    while(cycles-- > 0) {
        const startTS = process.hrtime.bigint();
        expensiveFunction();
        values.push(Number(process.hrtime.bigint() - startTS) / 1000000);
    }
    console.log("testEachRun", values.reduce((total, value) => total + value, 0));
}
function testAllRuns() {
    let cycles = ITERATIONS;
    const startTS = process.hrtime.bigint();
    while(cycles-- > 0) {
        expensiveFunction();
    }
    console.log("testAllRuns", Number(process.hrtime.bigint() - startTS) / 1000000);
}
testAllRuns();
testEachRun();

As you can see, if you measure each run, the total time is almost triplicated for the very same amount of iterations, and that's because all the extra time it takes to call the process.hrtime.bigint() method. If you only call it one time instead, you reduce the skew.
image

By the way, the skew is not important as long as all the tests have the very same skew. These benchmark libraries are created to compare different code between them and see which one is faster, not to calculate the exact CPU cycles that an operation takes, as there's always a skew because at some point you need to access a clock, a counter or whatever thing to calculate the differences.

@mindplay-dk
Copy link

Hmm, have you tried using the platform standard performance.now() instead?

https://developer.mozilla.org/en-US/docs/Web/API/Performance/now

@Llorx
Copy link
Owner Author

Llorx commented Aug 20, 2023

Hmm, have you tried using the platform standard performance.now() instead?

https://developer.mozilla.org/en-US/docs/Web/API/Performance/now

In Node.js that's just an alias for process.hrtime divided by 1000000. Just exactly as I'm already doing xD
https://github.com/nodejs/node/blob/484ad833580c978d2530d2257af95a34970db454/lib/internal/perf/utils.js#L17

What I don't remember is why I decided to go with bigint to later convert it to a Number. I think that because internally nodejs works with bigint when using process.hrtime? Going to check on that. Although I don't think I can't avoid much more skewing anyway.

@mindplay-dk
Copy link

This is nuts.

How did they manage to create a function just for accessing a micro timer that is this slow. 😐

@Llorx
Copy link
Owner Author

Llorx commented Aug 21, 2023

How did they manage to create a function just for accessing a micro timer that is this slow. 😐

Is not that. Is just that simply calling a native function has its overhead. JavaScript drawbacks. It was not designed for performance, although V8 guys are doing whatever they can 😅

@mindplay-dk
Copy link

mindplay-dk commented Aug 21, 2023

Calling any native function causes this much overhead?

I mean, they do say "javascript is slow", but... yikes. 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants