Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to limit the memory usage of AMF #1454

Open
lbhwyy opened this issue Nov 16, 2023 · 8 comments
Open

How to limit the memory usage of AMF #1454

lbhwyy opened this issue Nov 16, 2023 · 8 comments

Comments

@lbhwyy
Copy link

lbhwyy commented Nov 16, 2023

Versions

0.19.0
python=3.8
ubuntu 18.04

Issue

AMF integrated with River is very useful, but the unlimited growth of memory size restricts practical application. Could you please tell me how to limit the memory size of AMF?

@MaxHalford
Copy link
Member

Hey there. Just curious, how deep are your trees? How much memory are you consuming? Do you know?

@lbhwyy
Copy link
Author

lbhwyy commented Nov 16, 2023

I don't know how deep my trees are; it might be infinite because I couldn't find a parameter in the settings to control it. I used around 20,000 data points with 61 dimensions for online training. After that, I saved the model locally using joblib. I noticed that the size of the model is approximately 500MB.These are my settings:
model_train = forest.AMFClassifier(
n_estimators=50,
use_aggregation=True,
dirichlet=0.5,
seed=1
)
Additionally, I observed that the size of the model seems to be increasing approximately linearly.I would like to know if, for forest.AMFClassifier, it's possible to limit the memory usage of a single tree by restricting its depth, thereby constraining the overall memory usage of the entire forest.

@MaxHalford
Copy link
Member

Dear tree-hugger @smastelini, would you have some spare time to look into this? I think it's worthwhile to get a better understanding of how fast/deep Mondrian trees grow :)

@smastelini
Copy link
Member

Hey everyone, I will do my best. The vanilla Mondrian trees had a budget parameter if I recall correctly. I am not that familiarized with Aggregated Mondrian Trees, but I'll do my homework.

@lbhwyy
Copy link
Author

lbhwyy commented Nov 16, 2023

Thank you for your quick response! I appreciate your willingness to look into it. If you discover any information about the budget parameter for Aggregated Mondrian Trees during your research, I would be interested to learn more. Looking forward to any updates you can provide. Thanks again!

@ananiask8
Copy link

Any updates on this? If memory usage is unbounded, using this model online in production could result in eventual memory collapse.

@smastelini
Copy link
Member

No, not yet. I started reading the paper but so far I haven't found any type of "budget" parameter. Unfortunately my time is currently scarce, so I cannot delve in depth into this topic right now.

@smastelini
Copy link
Member

A small update. I finished skimming through the paper and from what I get, the theoretical robustness guarantees of the algorithm and its adaptive nature are the factors that should provide an automatic cap on the memory usage. There is no direct control from the user standpoint as far as I am concerned.

The idea is the algorithm would (eventually) adapt and converge while avoiding overfitting. This last aspect could be the main source of excessive memory usage, as far as decision tree structures are concerned.

I want to get a more practical understanding of AMFs by delving into the original code and the River adaptation. This could help me form a more practical and solid opinion from an application viewpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants