Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use thumbnail_url for thumbnail generation when present #675

Closed
1 task
stacimc opened this issue Aug 24, 2022 · 15 comments · Fixed by #898 · May be fixed by WordPress/openverse-api#1138
Closed
1 task

Use thumbnail_url for thumbnail generation when present #675

stacimc opened this issue Aug 24, 2022 · 15 comments · Fixed by #898 · May be fixed by WordPress/openverse-api#1138
Assignees
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: api Related to the Django API
Projects

Comments

@stacimc
Copy link
Contributor

stacimc commented Aug 24, 2022

Problem

Some providers (like SMK) may link to quite large images in their image_url. This can cause unacceptably slow response times or even timeouts when generating thumbnails via our thumbnail service.

Description

We already have a thumbnail_url available on the Image model which we can use.

When the thumbnail_url is available, we should send this to the thumbnail service instead of the image_url. This should be a small change here:

image_url = image.thumbnail or image.url

(using the serialized field names)

Additional context

#1450 tracks updating the SMK provider script in the Catalog to populate thumbnail_url with a link to a smaller image size.

Implementation

  • 🙋 I would be interested in implementing this feature.
@stacimc stacimc added 🟨 priority: medium Not blocking but should be addressed soon ✨ goal: improvement Improvement to an existing user-facing feature 💻 aspect: code Concerns the software code in the repository labels Aug 24, 2022
@sarayourfriend
Copy link
Contributor

Just a heads-up, the field name in the API right now is just thumbnail not thumbnail_url: https://github.com/WordPress/openverse-api/blob/2e85caf7aede8aaf9d77cd5cb050f50b860ee58e/api/catalog/api/models/mixins.py#L92

@sahil-R
Copy link
Contributor

sahil-R commented Dec 23, 2022

hey, @krysal @sarayourfriend @stacimc can I work on this, I think I can close this.

@dhruvkb
Copy link
Member

dhruvkb commented Dec 23, 2022

Sure @sahil-R, please feel free to do so! I'll assign it to you.

@sahil-R
Copy link
Contributor

sahil-R commented Dec 23, 2022

Thank you @dhruvkb

@krysal krysal self-assigned this Feb 7, 2023
@openverse-bot openverse-bot added this to Backlog in Openverse Feb 22, 2023
@obulat obulat transferred this issue from WordPress/openverse-api Feb 22, 2023
@zackkrida zackkrida moved this from Backlog to In progress in Openverse Mar 7, 2023
@krysal krysal moved this from In progress to To do in Openverse Mar 7, 2023
@krysal
Copy link
Member

krysal commented Mar 11, 2023

Following @obulat's request of evaluating the stored thumbnails, I ran a couple of queries on the API database to investigate.

See the providers with thumbnails
SELECT DISTINCT(provider) FROM image WHERE thumbnail IS NOT NULL ORDER BY provider ASC;

+--------------------+
| provider           |
|--------------------|
| 500px              |
| eol                |
| flickr             |
| iha                |
| met                |
| nappy              |
| phylopic           |
| rawpixel           |
| rijksmuseum        |
| sciencemuseum      |
| sketchfab          |
| smk                |
| stocksnap          |
| thingiverse        |
| thorvaldsensmuseum |
| waltersartmuseum   |
| wordpress          |
+--------------------+
SELECT 17
Time: 1950.363s (32 minutes 30 seconds), executed in: 1950.335s (32 minutes 30 seconds)

Following this result, I was curious to know how many thumbnails we have for each provider, omitting '500px', 'eol', 'iha', 'thorvaldsensmuseum' and 'waltersartmuseum' providers because those are disabled at the moment.

Thumbnails availability by (active) provider

SELECT provider,
    COALESCE(SUM(CASE WHEN thumbnail IS NOT NULL THEN 1 ELSE 0 END), 0) AS thumbs_available,
    COALESCE(SUM(CASE WHEN thumbnail IS NULL THEN 1 ELSE 0 END), 0) AS thumbs_not_available
FROM image
WHERE provider IN (
    'flickr', 'met', 'nappy', 'phylopic', 'rawpixel', 'rijksmuseum', 'sciencemuseum', 'sketchfab', 'smk', 'stocksnap', 'thingiverse', 'wordpress'
)
GROUP BY provider
ORDER BY provider ASC;

+---------------+------------------+----------------------+
| provider      | thumbs_available | thumbs_not_available |
|---------------+------------------+----------------------|
| flickr        | 497009314        | 911578               |
| met           | 128707           | 234675               |
| nappy         | 2059             | 0                    |
| phylopic      | 3145             | 1011                 |
| rawpixel      | 26239            | 136508               |
| rijksmuseum   | 30000            | 0                    |
| sciencemuseum | 82558            | 12143                |
| sketchfab     | 37872            | 0                    |
| smk           | 106651           | 1416                 |
| stocksnap     | 35436            | 1625                 |
| thingiverse   | 32659            | 0                    |
| wordpress     | 2518             | 3202                 |
+---------------+------------------+----------------------+
SELECT 12
Time: 2324.308s (38 minutes 44 seconds), executed in: 2324.287s (38 minutes 44 seconds)

The 'nappy', 'rawpixel', 'stocksnap' and 'wordpress' provider were recently added so I assume their thumbnails are acceptable. For the rest, I queried some samples and gathered some notes.

Old providers with thumbnails stored

The size shown is only for the sample at the right, it does not apply to all of the supplier's.

provider Size Sample thumbnail
flickr 240X180 (small) https://live.staticflickr.com/4397/36340706826_ee1c70429d_m.jpg
met 600x437 https://images.metmuseum.org/CRDImages/dp/web-large/DP818514.jpg
phylopic N/A Not working
rijksmuseum 200x165 (small) https://lh3.ggpht.com/Nsf8GjZnb8bDQJF2L7hzVc1YsBmG1Sf3rRPVC_ftd2GkXJRIckMvy8zoRBaXRXXnNOe8NUe0bLEMjxDnMpUthfmHlA=s200
sciencemuseum 320X256 https://coimages.sciencemuseumgroup.org.uk/images/516/194/large_thumbnail_1990_5131_11184__0001_.jpg
sketchfab 256x144 (samll) https://media.sketchfab.com/models/01758a58e868439d89b4c7e4c97afc98/thumbnails/5bca6d57c97f4b32a95ecfb35a3a1d9a/b54599a7e5614362b0e87fa7355c87f1.jpeg
smk 400x390 https://iip.smk.dk/iiif/jp2/kksgb22133.tif.jp2/full/!400,/0/default.jpg
thingiverse 333x250 https://cdn.thingiverse.com/renders/a2/08/4a/a6/a2/cc096c76061794fdf9c525971ab59948_display_medium.jpg

The only thumbnails not working are from 'phylopic', so I'll omit them. For the rest, some are small but since we don't have specific quality requirements at the moment I'll assume they are fine as long as they are online.

@obulat
Copy link
Contributor

obulat commented Mar 13, 2023

Thank you for this amazing investigation, @krysal!

The only thumbnails not working are from 'phylopic', so I'll omit them. For the rest, some are small but since we don't have specific quality requirements at the moment I'll assume they are fine as long as they are online.

I would prefer not to use the small thumbnails (flickr, rijksmuseum, sciencemuseum in the table) because when we discussed this issue (https://wordpress.slack.com/archives/C02012JB00N/p1678203055330969), the lowest width mentioned seems to have been 400px.

Having said that, Flickr might be OK to use because I suspect that the smaller thumbnails are from older images, and newer ones will have larger thumbnails available soon.

@AetherUnbound
Copy link
Contributor

@obulat how would you suggest proceeding in that case? Set the thumbnail to null for all other providers beyond Flickr that don't have at least 400px to a side?

@krysal
Copy link
Member

krysal commented Mar 13, 2023

I'd like to leave the code for using thumbnails without exception and address the small thumbnails in consequent issues. As I said, since they're working, it should be okay, and if it's important to have better thumbnail quality we can rise WordPress/openverse-catalog#905's priority and the sub-issues that may come from there.

@zackkrida
Copy link
Member

From a frontend perspective I really wouldn't want to go live with thumbnails below the 400px size we identified. The thumbnails are a user's first glimpse at potential results and if they're pixelated and blurry it's going to discourage them from ever using Openverse again. Using thumbnails which are smaller then we currently use feels like a big usability reduction that I don't think we should ship.

@obulat
Copy link
Contributor

obulat commented Mar 14, 2023

@obulat how would you suggest proceeding in that case? Set the thumbnail to null for all other providers beyond Flickr that don't have at least 400px to a side?

On the catalog side, I can think of two approaches for providers with no acceptable thumbnail available: either use the same URL as the main image or set the thumbnail to None. The first approach would mean that the code in the API is the same for all providers: take the image.thumbnail (if available, during the transition stage). The second approach would mean that we would use image.thumbnail if it's available (but we set the thumbnail to null for the existing images where the thumbnails are too small), and image.url otherwise.

Based on what we do in the catalog, the API would either always use the image.thumbnail (the first approach above) or use image.thumbnail for some providers and image.url for others. I guess we could have a configuration set for providers with an acceptable thumbnail, and check against it before falling back to image.url.

I have one more question about this issue. Do we want to thumbnail a thumbnail?. If we already have a thumbnail URL in the database, do we want to generate our own thumbnail out of it? Would this adversely affect the quality of the resulting image? What are the reasons for not using the provider's thumbnail link directly? Provide the Openverse cache layer for it to prevent dead images for the cached items?

@obulat
Copy link
Contributor

obulat commented Mar 14, 2023

I'd like to leave the code for using thumbnails without exception and address the small thumbnails in consequent issues. As I said, since they're working, it should be okay, and if it's important to have better thumbnail quality we can rise WordPress/openverse-catalog#905's priority and the sub-issues that may come from there.

Considering the implications of bad thumbnail quality on users' perceptions, I would prefer for us to go in another direction: start with only the providers that have good thumbnails (SMK and Met), and add other thumbnails when the other providers' thumbnail quality improves.

@krysal
Copy link
Member

krysal commented Mar 14, 2023

@obulat @zackkrida So are you suggesting we should delete existing thumbnails for rijksmuseum and sketchfab? Flickr is the main source from were the /thumb endpoint is failing, it is a necessity to have smaller images for thumbnails here, we can't delete those yet.

From a frontend perspective I really wouldn't want to go live with thumbnails below the 400px size we identified. The thumbnails are a user's first glimpse at potential results and if they're pixelated and blurry it's going to discourage them from ever using Openverse again.
@zackkrida

That number was also quite arbitrary, does it mean we should lower the default thumbnail size to 400px as well? We currently use 600px, which I thought was a high number too.

THUMBNAIL_WIDTH_PX = config("THUMBNAIL_WIDTH_PX", default="600")

I think the last point is a bit of overreaction as well. They might seem a little blurry in a piece of high-spec equipment and for an eye keen on the details, but most internet users nowadays navigate using a mobile phone, where these sizes should be acceptable and in fact, small images are preferred for data savings. These thumbnails are coming from our providers and we usually go with the idea that if do that then it's fine for us until we can come up with a better solution (in this case, review each provider and get better thumbnails).


On the catalog side, I can think of two approaches for providers with no acceptable thumbnail available: either use the same URL as the main image or set the thumbnail to None.

The thumbnail column should be empty if there is no thumbnail. Using the same link of the image will make the column lose its semantic meaning and it's an unnecessary redundancy.

I guess we could have a configuration set for providers with an acceptable thumbnail, and check against it before falling back to image.url.

Avoiding this complexity is precisely what I'm trying to do here, it's just delaying an issue. Let's work on the source of the problem instead. Providers should have acceptable thumbnails in the thumbnail column, if not, the direct image URL is used.

I have one more question about this issue. Do we want to thumbnail a thumbnail? If we already have a thumbnail URL in the database, do we want to generate our own thumbnail out of it? Would this adversely affect the quality of the resulting image? What are the reasons for not using the provider's thumbnail link directly? Provide the Openverse cache layer for it to prevent dead images for the cached items?

As far as I know, these questions are solved on the side of Photon. I don't appreciate a high reduction in the quality (depending on the size).

@AetherUnbound
Copy link
Contributor

Would it be possible for us to test this out on staging? It's hard to assess how this might impact the search view since we're presently generating our own thumbnails for all of these providers.

I think given whatever we discuss here, we'll also need to update our ingesters (or make note for future ingesters) that we should only record a thumbnail if it meets a minimum size criteria.

@zackkrida
Copy link
Member

My only goal is to prevent us from shipping a noticeable visual regression to users. In the case of Openverse, 90% of our users are on desktop:

CleanShot 2023-03-14 at 11 27 52

On a typical laptop screen the dimensions of each image thumbnail are around ~2-300px. When we factor in contemporary high DPI displays that means 4-600px is an ideal size. Anything smaller is going to stretch. Here's an example with a 200x200 image of a cat which at its default size is sharp and high quality:

CleanShot 2023-03-16 at 13 23 25@2x

I don't feel qualified to manage the technical approach to how to fix the PR. Perhaps deleting all non-viable thumbnail urls will be the best.

@krysal
Copy link
Member

krysal commented Mar 16, 2023

Would it be possible for us to test this out on staging? It's hard to assess how this might impact the search view since we're presently generating our own thumbnails for all of these providers.
@AetherUnbound

Good idea! As @sarayourfriend will perform tests in staging with near-production data size, the timing couldn't be more perfect. We just need to merge #1331 before.

I think given whatever we discuss here, we'll also need to update our ingesters (or make note for future ingesters) that we should only record a thumbnail if it meets a minimum size criteria.
@AetherUnbound

Agree, I can do that after @zackkrida or @obulat (or anyone else willing to) evaluate the use of the stored thumbnails in staging. and delete the bad ones by providers if needed. Hope this gives us more confident in this solution.

@krysal krysal moved this from To do to In progress in Openverse Mar 17, 2023
@obulat obulat added 🧱 stack: api Related to the Django API and removed 🧱 stack: backend labels Mar 20, 2023
Openverse automation moved this from In progress to Done! Mar 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: api Related to the Django API
Projects
Archived in project
Openverse
  
Done!
8 participants