[QST] Is it possible to extract indices and continuous features rules from NVTabular workflow? #1866

Nepherhotep · 2023-10-10T20:31:47Z

We are trying to optimize features preprocessing step for the real-time inference, where latency is critical. We can cache some intermediate data for building tensor more efficiently, but for that purposes we need a way to extract categorical features mapping, as well as continuous feature conversion rules from the trained NVTabular workflow. Is there a way doing it?

Thanks!

shoyasaxa · 2023-10-11T15:31:40Z

Hello team - just to add more information,

We are setting up online inference where features need to be preprocessed in real-time. We just need to preprocess one to few rows of data, and passing it through NVT transform() function takes too long.
We are looking to instead extract the categorical features mapping that NVT workflow has fitted to as well as the statistics that NVT collected in for the Normalize operator for each of the continuous variables (please assume all the continuous variables are simply passed through Normalize operator).
We are aware that the index mapping for categorical features can be retrieved by looking at the parquet files in the categories/ folder of the saved workflow. However, the difficulty comes with extracting the statistics learned for the continuous variables. From a quick glance around, it doesn't seem like these statistics are saved in a separate file, and I'm guessing they are pickled together in the workflow. We are looking to be able to do something similar to the following with an already fitted workflow:

print(workflow.learned_statistics["my_continuous_variable1"])
>> {"mean": 0.85, "std": 1.2 }

Is something like this possible? Please let us know!

We are in NVT version 23.04.00 using merlin-pytorch:23.04 from here.

Thank you for your help!

sibadakesi · 2023-12-14T09:33:44Z

We encountered the same problem

Nepherhotep added the question Further information is requested label Oct 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] Is it possible to extract indices and continuous features rules from NVTabular workflow? #1866

[QST] Is it possible to extract indices and continuous features rules from NVTabular workflow? #1866

Nepherhotep commented Oct 10, 2023

shoyasaxa commented Oct 11, 2023 •

edited

sibadakesi commented Dec 14, 2023 •

edited

[QST] Is it possible to extract indices and continuous features rules from NVTabular workflow? #1866

[QST] Is it possible to extract indices and continuous features rules from NVTabular workflow? #1866

Comments

Nepherhotep commented Oct 10, 2023

shoyasaxa commented Oct 11, 2023 • edited

sibadakesi commented Dec 14, 2023 • edited

shoyasaxa commented Oct 11, 2023 •

edited

sibadakesi commented Dec 14, 2023 •

edited