Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid Session ID (Large amount of posts) #32

Open
pixml27 opened this issue Aug 20, 2020 · 4 comments
Open

Invalid Session ID (Large amount of posts) #32

pixml27 opened this issue Aug 20, 2020 · 4 comments

Comments

@pixml27
Copy link

pixml27 commented Aug 20, 2020

First of all, thanks for this scrapper!
My problem is that when I download a large number of posts (> 4000) with 5-10 comments for each post, chrome just crashes.

Initially, I got an error when opening uncollapsed comments (invalid session ID)
Then I changed the code, set to open comments at the time of the scroll function, and the error began to appear there (invalid session ID again)

I read a lot of threads on the stackoverflow, they recommend adding some options to chrome, I tried it all. Also, many places offer to add memory to chrome (if using docker), but I just run the script
It also seems to me that this problem is somehow related to memory, chrome closes due to too many images, media, etc.
Can you help me somehow? Have you had this and have you tested the script on large amounts of information?

Снимок экрана 2020-08-20 в 01 09 02

@brutalsavage
Copy link
Owner

Unfortunately I haven't tested with large amount of posts. Chrome does have a lot of memory issues. If anyone has any solution feel free to comment.

@webcoderz
Copy link

webcoderz commented Sep 10, 2020

heres some options including how to run it headless which should help
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
#chrome_options.add_argument("--disable-extensions")
#chrome_options.add_argument("--disable-gpu")
#chrome_options.add_argument("--no-sandbox") # linux only
chrome_options.add_argument("--headless")
chrome_options.headless = True # also works
driver = webdriver.Chrome(options=chrome_options)

@brutalsavage
Copy link
Owner

running it headless looks like a potential solution.

You should have
chrome_options.add_argument("--disable-gpu")
if you are running on windows

source: https://stackoverflow.com/questions/53657215/running-selenium-with-headless-chrome-webdriver

@brutalsavage brutalsavage changed the title Invalid Session ID Invalid Session ID (Large amount of posts) Sep 10, 2020
@pixml27
Copy link
Author

pixml27 commented Sep 10, 2020

Thanks for advices, but i tried it all in different combinations
Here are all options i founded closed to this problem, but it doesn't work

option.add_argument('--no-sandbox')

option.add_argument("--enable-automation")

#option.add_argument("start-maximized")

option.add_argument("--disable-extensions")

option.headless = True

option.add_argument('--disable-dev-shm-usage') 

# Pass the argument 1 to allow and 2 to block
option.add_experimental_option("prefs", {
    "profile.default_content_setting_values.notifications": 1,
    "profile.managed_default_content_settings.images": 2, 'disk-cache-size': 16000
})
option.add_experimental_option("excludeSwitches", ["enable-automation"])
option.add_experimental_option('useAutomationExtension', False)

P.S. I'm runnig on linux
P.P.S. I was trying to emulate scrolling down the page in chrome using the RPA-platform (basically just pressing the down key, but without human intervention)
And Facebook stopped sending new posts(or updating page) somewhere after 300 scrolls (or chrome stopped loading them)
But the chrome did not fall
So there may be a problem in Facebook protection, but then why does chrome fall in scraper....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants