Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to better tune peak memory usage #260

Open
cyc opened this issue Jan 21, 2022 · 3 comments
Open

How to better tune peak memory usage #260

cyc opened this issue Jan 21, 2022 · 3 comments

Comments

@cyc
Copy link

cyc commented Jan 21, 2022

I have some datasets and transformations that I want to run that unfortunately won't fit on n1-highmem-16 instances (which is what FlexRS requires). The features are fairly standard scalar features with tft.quantiles analyzer and string features with tft.vocabulary analyzer (but there are a lot of each type of feature). Generally the analyze step will run fine up until the final combine which will typically run on a very small number of machines and cause them to repeatedly OOM.

Of course I can do something like use a larger machine type or even a custom machine type, but these don't work with FlexRS and would be more expensive. I'm generally curious about whether either of the following two options would be viable solutions:

  1. Shard the analyze step by features. So split up the set of features into separate groups and run multiple different analyze steps sequentially, which should hopefully reduce peak memory usage. The challenge would be how to merge the outputs of the analyze steps together at the end.
  2. Add beam resource hints specifically to the problematic combine tasks so that they do not get scheduled to run on the same machine.

Are either of these two options viable or is there a solution that I have not considered yet?

@zoyahav
Copy link
Member

zoyahav commented Jan 24, 2022

Quick initial questions -
a. what version of TFT is your pipeline running with
b. same question about beam
c. how many analyzers are defined in the pipeline, and of what types? (how many quantiles, vocabulary, etc.)

@cyc
Copy link
Author

cyc commented Jan 24, 2022

a. TFT 1.5.0
b. Beam 2.35.0
c. 1 tft.quantiles analyzer with dimension 840 (and reduce_instance_dims=False), 15 tft.vocabulary analyzers, and 96 tft.experimental.approximate_vocabulary analyzers.

@cyc
Copy link
Author

cyc commented Jan 27, 2022

Also, I should add that from my experiments testing these analyzers using DirectRunner it's not the quantiles analyzer that consumes most of the memory, it's the tft.vocabulary analyzers (tested this by disabling different analyzers and measuring the amount of memory allocated)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants