[FEA] LoRe framework - Support operator specific dump tool for performance issue local reproduce #10843

winningsix · 2024-05-20T09:43:03Z

Is your feature request related to a problem? Please describe.
There exist some challenges in reproduce some performance issues locally with a simple approach. As a developer/user, it would be great to have a tool:

“At will” to dump data at operator granularity. Either performance issue or compatibility issue (a.k.a. Semantic consistency), the problematic operator may come in relatively late stage. The prior filter may have quite some different behavior in selectivity or a join with multiple tables where the join key is not easy to reproduce. Thus, it is really hard to make a local reproduce of the original issue. “At will” means we want to reproduce the issues at a specific operator without rerun the entire plan fragment or SQL.
“Replay” the execution locally with a single Spark application. A “replay” means the exact operator with the dump data can run directly from developer side.
Being able to do data desensitization for the dump data if needed. Two running modes are provided: masked mode VS. plain mode. For plain mode, data is not masked and used to generate needed NCU/Nsys directly for that data. For masked mode, data is translated into masked data in an irreversible way.
Being able to reproduce both diff and performance issues.
Provide a reasonable way to dump the data in a controllable way. For diff issues, it should allow the dump a dedicated number of rows. For performance issues, it allows dumping the execution batch which lasts longer than a preconfigured threshold. Additionally, it should provide a task limit avoiding dumping too much data.
Async dump mode for a problematic operator. Don't wait for job's complete to trigger the file/class dump.

Describe the solution you'd like
The workflow of LOcal REplay (Lore) framework usage in performance issue replay is as follows:

Dump: Developer decides which operator to dump and decide the threshold of operator time we should take care of. Using the following capture as example, if we set the dump filter with a threshold of 2 second, then it will dump related columnar batch as well as a serialized GpuProject class into binary files. With a configuration of task limit, it could specific how many data file dumps for each single file.
Replay: Restoring from the dumped files, the problematic operator together with some specific columnar batch will run locally easily.

This comprises of several sub-tasks:

ProjectExec support: [FEA] Support ProjectExec in LoRe framework #10862
HashAggExec support: [FEA] Support GpuHashAggregateExec in LoRe #10942

The text was updated successfully, but these errors were encountered:

winningsix added feature request New feature or request ? - Needs Triage Need team to review and classify labels May 20, 2024

mattahrens assigned res-life May 21, 2024

mattahrens removed the ? - Needs Triage Need team to review and classify label May 21, 2024

winningsix changed the title ~~[FEA] Support operator specific dump tool for performance issue local reproduce~~ [FEA] LoRe framework - Support operator specific dump tool for performance issue local reproduce May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] LoRe framework - Support operator specific dump tool for performance issue local reproduce #10843

[FEA] LoRe framework - Support operator specific dump tool for performance issue local reproduce #10843

winningsix commented May 20, 2024 •

edited

[FEA] LoRe framework - Support operator specific dump tool for performance issue local reproduce #10843

[FEA] LoRe framework - Support operator specific dump tool for performance issue local reproduce #10843

Comments

winningsix commented May 20, 2024 • edited

winningsix commented May 20, 2024 •

edited