-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High memory usage with github page as sample #41
Comments
Thanks for reporting. General tip: Add more than one sample to see if it persists. Using only one sample cannot yield statistically sound heuristics. Will add a warning. It even says so in the comment you coped! |
It's not a memory leak, I assume, it's just using a lot of memory. Usually potential CSS rules get reduced by applying them to every sample. If you add only one, that does not work. |
Same code produces the following result for me withing seconds:
Please add dependencies ( Even the readme clearly states:
|
If I run this on google colab, I don't get high memory usage but I get 'is not in list' error. However this still causes high memory locally with python 3.10 and mlscraper (both pre and develop versions). Link: https://colab.research.google.com/drive/1frHuWVaAq-86FhhwCSyYlel-qBxaPDIs?usp=sharing Both python Version 3.9 and 3.10 tested on google colab and locally on ubuntu 22.04 requirements.txt
Not sure what is going on. You seem to get a good result while I cannot, using the same code. |
Why don't you add a second example? |
W was now able to reproduce, will look into this if I find the time. Seems like it does not stop generating CSS selectors although the tag is unique already. |
Any news on this issue? I run into the same problem with another website |
I'm running into the same problem :( |
Has anyone managed to work around this yet? Tried a number of different sites with 5+ samples for each but always running out of memory. |
Same problem,i have 32g memory but always running out of memory :( |
The text was updated successfully, but these errors were encountered: