Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WikiArt scraper only scraping <3000 images #27

Open
fk798 opened this issue Dec 7, 2020 · 4 comments
Open

WikiArt scraper only scraping <3000 images #27

fk798 opened this issue Dec 7, 2020 · 4 comments

Comments

@fk798
Copy link

fk798 commented Dec 7, 2020

Hi! When scraping and downloading images to train the DCGAN on, the scraper is unable to get access to the full dataset. Instead, for example when I try downloading images using the command python art.py --genre=landscape --num_pages=250 --output_dir=landscape_scraped I am only able to download around 2400 images before the prorgram ends. However, when you go to the WikiArt website, it shows that for landscape there are around 22000 images available.

Here's what I think the issue is: when you go to the landscape page, the webpage shows that there are a total of 3600 images you can see. I tried scrolling all the way down to see if there were other pages I could access with different images, but it doesn't show any buttons to go to any other pages (if there are any). It looks like WikiArt has their website so that you can only view those 3600 images instead of the entire dataset, which poses a problem since we have less data to train the network on. I might be wrong since I don't really know how WikiArt works, but how can I obtain more images than just the 2400 images?

Thanks in advance!

@sebamacchia
Copy link

hi!, are you using the genre-scraper.py file?

@rosefeller
Copy link

rosefeller commented May 13, 2021 via email

@fk798
Copy link
Author

fk798 commented May 13, 2021

Ah my bad, @sebamacchia yeah I meant the genre-scraper.py file. I just renamed it to art.py but its the same thing.

@rosefeller sure, my email address is [email protected]. If it doesn't show (for some reason your email is starred out with asterisks), its just my GitHub handle at the rate nyu.edu

@rosefeller
Copy link

rosefeller commented May 14, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants