/scrape public domain info from social media sites (like privacy policy) #161

Gbillington1 · 2024-05-19T04:17:05Z

I'm using Firecrawl in my application to scrape privacy policies for many websites. It works great for most cases but fails with a 403 error when trying to scrape what Firecrawl considers "social media" sites. I got this error when trying to scrape the privacy policy of twitter (ironically x.com works), and Instagram:

https://twitter.com/en/privacy
https://help.instagram.com/155833707900388

I'm assuming that these sites are being blocked by some blacklist on your side, but it would be awsome if I could scrape pages that don't necessarily relate to the data on the platforms. I want the information about the privacy policy, which Is public domain info that isn't related to the data that is stored on the social platforms. If you guys could make it possible for me to scrape these types of sites it would be greatly appreciated!

nickscamara · 2024-05-19T05:18:46Z

Hey! Thanks for the feedback! Very good point, we will look into it on monday!

nickscamara · 2024-05-19T05:24:04Z

ccing @rafaelsideguide

Gbillington1 · 2024-05-20T19:37:30Z

Awesome, thanks!

nickscamara · 2024-05-21T00:27:40Z

Hey @Gbillington1, just submitted a pr that might help with this for now. If you have a better idea, or other ways we can do this lmk.

nickscamara · 2024-05-24T16:43:29Z

@Gbillington1 Merged!

nickscamara mentioned this issue May 21, 2024

feat: Allow privacy/legal/ other pages in social media websites #168

Merged

nickscamara added the in review label May 21, 2024

nickscamara closed this as completed May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

/scrape public domain info from social media sites (like privacy policy) #161

/scrape public domain info from social media sites (like privacy policy) #161

Gbillington1 commented May 19, 2024

nickscamara commented May 19, 2024

nickscamara commented May 19, 2024

Gbillington1 commented May 20, 2024

nickscamara commented May 21, 2024

nickscamara commented May 24, 2024

/scrape public domain info from social media sites (like privacy policy) #161

/scrape public domain info from social media sites (like privacy policy) #161

Comments

Gbillington1 commented May 19, 2024

nickscamara commented May 19, 2024

nickscamara commented May 19, 2024

Gbillington1 commented May 20, 2024

nickscamara commented May 21, 2024

nickscamara commented May 24, 2024