-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
/scrape public domain info from social media sites (like privacy policy) #161
Labels
Comments
Hey! Thanks for the feedback! Very good point, we will look into it on monday! |
ccing @rafaelsideguide |
Awesome, thanks! |
Hey @Gbillington1, just submitted a pr that might help with this for now. If you have a better idea, or other ways we can do this lmk. |
@Gbillington1 Merged! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'm using Firecrawl in my application to scrape privacy policies for many websites. It works great for most cases but fails with a 403 error when trying to scrape what Firecrawl considers "social media" sites. I got this error when trying to scrape the privacy policy of twitter (ironically x.com works), and Instagram:
https://twitter.com/en/privacy
https://help.instagram.com/155833707900388
I'm assuming that these sites are being blocked by some blacklist on your side, but it would be awsome if I could scrape pages that don't necessarily relate to the data on the platforms. I want the information about the privacy policy, which Is public domain info that isn't related to the data that is stored on the social platforms. If you guys could make it possible for me to scrape these types of sites it would be greatly appreciated!
The text was updated successfully, but these errors were encountered: