Benchmarking #11241
Replies: 1 comment
-
Any benchmarking (and testing for that matter) need to acknowledge that bokeh consists of two both separable and inseparable components (Python) bokeh and bokehjs. Thus any benchmarking/testing needs to be done for bokeh, bokehjs and bokeh <-> bokehjs integration. For example, when benchmarking/testing webgl changes, the primary concern is to benchmark/test bokehjs, and those benchmarks/tests are done in a JS runtime (synthetic or a proper web browser) and written in TypeScript. In this example testing bokeh and integration is less relevant, though it may be relevant indirectly, e.g. a regression to the binary protocol may hinder webgl backend's usability. For testing bokehjs we have a well established framework developed, that centers around Chroimum's DevTools protocol, so any benchmarking improvements need to be integrated into that framework. DevTools protocol gives us API access to everything that DevTools UI offers. From benchmarking perspective this means that we can use CPU/memory/networking profiling tools, which would allow us to create benchmarks that go beyond simple, but hopefully statistically significant, run time measurements. As to bokeh, I don't have any opinions, so we may want to try out Benchmarking in CI is out of question due to high variability of external interference, unless we could control the CI environment completely, which we can't on GitHub Actions. Thus any benchmarking will have to be done locally, but this also has to be done with care. Finally benchmarking, compared to testing, is a comparative process, so we always have to benchmark two or more versions of bokeh/bokehjs at the same time in the same environment (software and hardware wise). |
Beta Was this translation helpful? Give feedback.
-
Discussion started on slack and transferred across here to widen the audience.
I was looking for guidance on the best approach to benchmark Bokeh changes, but I have realised that the scope is wider than this. I can think of two main areas for benchmarking:
Guidance for new contributors, presumably in the dev docs. Probably browser-based, and needs to include static data, streaming data and user interaction such as zooming, panning, selection and changing visual properties. Could be a manual process but needs to be repeatable, so perhaps attaching javascript callbacks to controls so that browser performance monitoring can be started before the controls are activated.
Automated benchmarking, e.g. via selenium, whether in CI or not (issue [FEATURE] Performance Comparisons and Benchmarks #11228 is relevant here). I have concerns that benchmarking in CI isn't tenable as the code is running in VMs on shared hardware of varying workload, but it should be relatively easy to confirm this. For automated benchmarking outside of CI I am aware of airspeed velocity (https://asv.readthedocs.io) which is used by NumPy and SciPy. It has support for checking out old github commits and creating virtual environments so you can run a set of benchmarks looking back in time to identify performance improvements/regressions.
Beta Was this translation helpful? Give feedback.
All reactions