Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] Is it possible to extract indices and continuous features rules from NVTabular workflow? #1866

Open
Nepherhotep opened this issue Oct 10, 2023 · 2 comments
Labels
question Further information is requested

Comments

@Nepherhotep
Copy link

We are trying to optimize features preprocessing step for the real-time inference, where latency is critical. We can cache some intermediate data for building tensor more efficiently, but for that purposes we need a way to extract categorical features mapping, as well as continuous feature conversion rules from the trained NVTabular workflow. Is there a way doing it?

Thanks!

@Nepherhotep Nepherhotep added the question Further information is requested label Oct 10, 2023
@shoyasaxa
Copy link

shoyasaxa commented Oct 11, 2023

Hello team - just to add more information,

  • We are setting up online inference where features need to be preprocessed in real-time. We just need to preprocess one to few rows of data, and passing it through NVT transform() function takes too long.
  • We are looking to instead extract the categorical features mapping that NVT workflow has fitted to as well as the statistics that NVT collected in for the Normalize operator for each of the continuous variables (please assume all the continuous variables are simply passed through Normalize operator).
  • We are aware that the index mapping for categorical features can be retrieved by looking at the parquet files in the categories/ folder of the saved workflow. However, the difficulty comes with extracting the statistics learned for the continuous variables. From a quick glance around, it doesn't seem like these statistics are saved in a separate file, and I'm guessing they are pickled together in the workflow. We are looking to be able to do something similar to the following with an already fitted workflow:
print(workflow.learned_statistics["my_continuous_variable1"])
>> {"mean": 0.85, "std": 1.2 }

Is something like this possible? Please let us know!

We are in NVT version 23.04.00 using merlin-pytorch:23.04 from here.

Thank you for your help!

@sibadakesi
Copy link

sibadakesi commented Dec 14, 2023

We encountered the same problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants