Regression in lossy nonphotographic images #3530

Galaxy4594 · 2024-04-25T10:26:56Z

Summary
In my limited testing, libjxl 0.8.2 outperforms libjxl 0.10.2 in nonphotographic images like artwork/screenshots. Similar behavior can be seen in photographic images but the gap in quality is smaller and could be with in margin of error.

Environment
Libjxl 0.10.2 and 0.8.2 are downloaded from the releases page
Identical behavior on Windows 10 and Ubuntu 22.04

Testing Methodology
Nonphotographic image: _ultraman
Photographic image: donald-michael-chambers-x2d-xcd55v-4

I libjxl 0.10.2 distances 0.75, 1.5, and 3 to represent high-medium-low quality benchmarks. These 0.10.2 images will be used as reference, I adjusted the distance setting in 0.8.2 to match the bbp of the reference images. Then I measured the subsequent outputs using four metrics, ssimulacra2, butteraugli, butteraugli (3-norm), and dssim. Since dssim does not allow .jxl input, I converted the images into a 16-bit png for the most quality and consistency.

I noticed that in benchmark_xl 0.8.2, ssimularcra2 and butteraugli gave different results for the same image when compared to those in the 0.10.2 library. For the sake of consistency, I have decided that tools provided in 0.10.2 are more accurate than those in 0.8.2, therefore I have replaced the ssimulacra2 and butteraugli numbers with the those provided in the latest release.

Side note: Since I'm replacing certain variables for 0.8.2 tables, the "Aggregate" row is not accurate, it shouldn't be necessary anyways since the difference is quite noticeable.

Nonphotographic image comparisons:
0.8.2

_ultraman.png
Encoding      kPixels    Bytes          BPP  E MP/s  D MP/s     Max norm  SSIMULACRA2        pnorm       BPP*pnorm   Bugs
-------------------------------------------------------------------------------------------------------------------------
jxl:d0.594       1387   312905    1.8035754   5.065  88.485   1.05268490  88.36280502   0.475509   0.857616334878    0
jxl:d1.22        1387   198023    1.1413989   5.118 106.640   1.67951285  83.06127014   0.756590   0.863580964129    0
jxl:d2.62        1387   122924    0.7085304   5.472 113.922   3.41860055  74.48578629   1.146457   0.812299869103    0
Aggregate:       1387   196753    1.1340793   5.215 102.439   1.82155991  82.18873767   0.744380   0.844186007485    0

0.10.2

_ultraman.png
Encoding      kPixels    Bytes          BPP  E MP/s  D MP/s     Max norm  SSIMULACRA2   PSNR        pnorm       BPP*pnorm   QABPP   Bugs
----------------------------------------------------------------------------------------------------------------------------------------
jxl:d0.75        1387   313129    1.8048665   4.101  86.106   1.20206707  87.80768917  44.70   0.46107022  0.832170203284   2.170    0
jxl:d1.5         1387   199238    1.1484021   4.211  98.007   2.00189194  82.26755929  41.30   0.75180506  0.863374498072   2.299    0
jxl:d3           1387   123075    0.7094007   4.376 110.069   3.62111758  72.93681785  37.61   1.22165818  0.866645225771   2.569    0
Aggregate:       1387   197282    1.1371303   4.228  97.570   2.05780390  80.76737921  41.10   0.75094364  0.853920767103   2.340    0

dssim metrics

High quality (~1.80 bpp)
0.00028548	_ultraman-0.8.2
0.00029972	_ultraman-0.10.2
Medium quality (~1.14 bpp)
0.00075359	_ultraman-0.8.2
0.00076116	_ultraman-0.10.2
Low quality (~0.71 bpp)
0.00174440	_ultraman-0.8.2
0.00192789	_ultraman-0.10.2

Photographic image comparisons:
0.8.2

donald-michael-chambers-x2d-xcd55v-4.png
Encoding       kPixels    Bytes          BPP  E MP/s  D MP/s     Max norm  SSIMULACRA2        pnorm       BPP*pnorm   Bugs
--------------------------------------------------------------------------------------------------------------------------
jxl:d0.614       6370   620830    0.7796923   5.601 161.527   1.07826220  88.77737185   0.41565914  0.323904888692    0
jxl:d1.39        6370   342875    0.4306122   5.771 169.412   1.85036587  83.73489551   0.67712914  0.291309491353    0
jxl:d3.17         6370   169212    0.2125110   5.545 106.820   3.31106091  73.83532398   1.18592679  0.252022503415    0
Aggregate:        6370   330089    0.4145544   5.638 142.982   1.87781044  82.08119056   0.69367425  0.287565746072    0

0.10.2

donald-michael-chambers-x2d-xcd55v-4.png
Encoding      kPixels    Bytes          BPP  E MP/s  D MP/s     Max norm  SSIMULACRA2   PSNR        pnorm       BPP*pnorm   QABPP   Bugs
----------------------------------------------------------------------------------------------------------------------------------------
jxl:d0.75        6370   620482    0.7792548   7.007 122.930   1.18638035  88.24965401  48.27   0.44016851  0.343003408582   0.924    0
jxl:d1.5         6370   342590    0.4302540   7.295 149.051   2.14467233  83.09617108  44.74   0.73977772  0.318292357622   0.923    0
jxl:d3           6370   169512    0.2128878   5.143 163.187   3.78650298  73.32046639  41.27   1.24507709  0.265061695613   0.806    0
Aggregate:       6370   330295    0.4148124   6.406 144.066   2.12785027  81.31545211  44.67   0.74012570  0.307013311650   0.883    0

dssim metrics

High quality (~0.78 bbp)
0.00016938	donald-0.8.2-hq.png
0.00020154	donald-0.10.2-hq.png
Medium quality (~0.43 bbp)
0.00053243	donald-0.8.2-mq.png
0.00056884	donald-0.10.2-mq.png
Low quality (~0.21 bbp)
0.00165394	donald-0.8.2-lq.png
0.00167708	donald-0.10.2-lq.png

Analysis/Conclusion
For artwork, it's clear to see that 0.8.2 has consistently scored higher in all tested metrics by significant amount. Libjxl 0.10.2 lags behind in quality and in speed for the same bbp. The speed regression is really strange since 0.10.2 should be much faster. I guess it slowed down for small and medium sized images. (should this is a separate issue?) On the other hand, the photo consistently scored higher in all metrics as well, but the gap is smaller, so its probably within margin of error. Most likely there is no regression for the photo. Also, here is where 0.10.2 outperforms 0.8.2 in terms of speed, maybe because it's a larger image?

PS: I also noticed that 0.9.2 gives similar subpar results compared to 0.8.2 but I didn't have the time to go into it with detail. So it seems that the regression happened in between 0.8.2 and 0.9.2. So more testing is needed regardless. Also, if somebody can provide me with a script that automates this process, that would make my life a lot easier.

The text was updated successfully, but these errors were encountered:

Galaxy4594 · 2024-04-26T08:43:37Z

Follow Up:
I did more testing for 0.9.2 and the git master (4da4b9a) versions. Keep in mind I used the same testing methodology from before. I'm using the image metric tools provided in 0.10.2 and I replaced ssimulacra2 and butteraugli scores in the tables below. Here is what I found:

Artwork:
0.9.2

_ultraman.png
Encoding      kPixels    Bytes          BPP  E MP/s  D MP/s     Max norm  SSIMULACRA2   PSNR        pnorm       BPP*pnorm   QABPP   Bugs
----------------------------------------------------------------------------------------------------------------------------------------
jxl:d0.776       1387   313942    1.8095526   2.122  87.061   1.05535292  88.18518366  44.43   0.456649  0.826350682147   1.910    0
jxl:d1.56        1387   198478    1.1440215   2.198  92.191   1.98494279  82.74740828  41.04   0.749756  0.857736472899   2.271    0
jxl:d3.068       1387   123160    0.7098907   2.205  93.007   3.65560531  72.99446351  37.41   1.213732  0.861617436951   2.595    0
Aggregate:       1387   197247    1.1369262   2.175  90.715   1.97106804  81.08107204  40.86   0.746240  0.848420035114   2.241    0

git-master

C:\Users\Fabio\Pictures\_ultraman.png
Encoding      kPixels    Bytes          BPP  E MP/s  D MP/s     Max norm  SSIMULACRA2   PSNR        pnorm       BPP*pnorm   QABPP   Bugs
----------------------------------------------------------------------------------------------------------------------------------------
jxl:d0.751       1387   313919    1.8094201   2.543  67.087   1.16311717  88.02712453  44.91   0.460241  0.835016182014   2.094    0
jxl:d1.482       1387   198438    1.1437909   2.743  63.108   1.97989273  82.68202392  41.37   0.738425  0.848231468962   2.265    0
jxl:d3.02        1387   122897    0.7083748   2.781  55.470   4.41225004  72.99714138  37.92   1.239742  0.879129429439   3.126    0
Aggregate:       1387   197088    1.1360123   2.687  61.697   2.16223236  80.87302586  41.30   0.751688  0.853926904503   2.456    0

Photograph:
0.9.2

donald-michael-chambers-x2d-xcd55v-4.png
Encoding      kPixels    Bytes          BPP  E MP/s  D MP/s     Max norm  SSIMULACRA2   PSNR        pnorm       BPP*pnorm   QABPP   Bugs
----------------------------------------------------------------------------------------------------------------------------------------
jxl:d0.769       6370   620713    0.7795449   2.435 132.488   1.20697951  88.41863950  48.71   0.426740  0.332718142497   0.942    0
jxl:d1.62        6370   342599    0.4302653   2.560 150.726   2.20098257  83.80748849  45.25   0.690876  0.297259921485   0.947    0
jxl:d3.98        6370   169074    0.2129945   2.497 137.780   3.49073076  74.86919107  41.74   1.131144  0.240927722578   0.744    0
Aggregate:       6370   330394    0.4149368   2.497 140.125   2.10211930  82.15074307  45.14   0.693507  0.287761842558   0.872    0

git master

C:\Users\Fabio\Desktop\test\donald-michael-chambers-x2d-xcd55v-4.png
Encoding      kPixels    Bytes          BPP  E MP/s  D MP/s     Max norm  SSIMULACRA2   PSNR        pnorm       BPP*pnorm   QABPP   Bugs
----------------------------------------------------------------------------------------------------------------------------------------
jxl:d0.748       6370   620959    0.7798538   4.594  76.309   1.19779276  88.24129648  48.26   0.441826  0.344559445998   0.934    0
jxl:d1.47        6370   343129    0.4309310   5.098  82.650   2.14825725  83.21586599  44.79   0.733236  0.315971614256   0.926    0
jxl:d3.0         6370   167612    0.2105016   3.413  98.681   3.79526829  73.36415435  41.32   1.237906  0.260574322859   0.799    0
Aggregate:       6370   329313    0.4135793   4.307  85.379   2.13752235  81.36121861  44.70   0.737432  0.304986967964   0.884    0

Compilation of all results so far:
Artwork:

Quality	0.8.2	0.9.2	0.10.2	git-master
Butteraguli hq	1.05268490	1.05535292	1.20686650	1.16311717
Butteraguli mq	1.67951285	1.98494279	1.88279652	1.97989273
Butteraguli lq	3.41860055	3.65560531	3.62103772	4.41225004
pnorm hq	0.475509	0.456649	0.461070	0.460241
pnorm mq	0.756590	0.749756	0.751805	0.738425
pnorm lq	1.146457	1.213732	1.221658	1.239742
dssim hq	0.00028548	0.00029263	0.00029972	0.00029987
dssim mq	0.00075359	0.00075982	0.00076116	0.00074868
dssim lq	0.00174440	0.00190876	0.00192789	0.00188582
ssim2 hq	88.3628050	88.1851836	87.8455684	88.0271245
ssim2 mq	83.0612701	82.7474082	82.5079800	82.6820239
ssim2 lq	74.4857862	72.9944635	72.9740434	72.9971413

Photo:

Quality	0.8.2	0.9.2	0.10.2	git-master
Butteraguli hq	1.07760822	1.20697951	1.18638503	1.19779276
Butteraguli mq	1.85578429	2.20098257	2.14458465	2.14825725
Butteraguli lq	3.31106162	3.52620077	3.78646683	3.79526829
pnorm hq	0.415647	0.426740	0.440175	0.441826
pnorm mq	0.677129	0.690876	0.739795	0.733236
pnorm lq	1.185959	1.132029	1.245124	1.237906
dssim hq	0.00016938	0.00018462	0.00020154	0.00020188
dssim mq	0.00053243	0.00051287	0.00056884	0.00056072
dssim lq	0.00165394	0.00155551	0.00167708	0.00166082
ssim2 hq	88.7773718	88.4186395	88.2259936	88.2412964
ssim2 mq	83.7348955	83.8074884	83.1033137	83.2158659
ssim2 lq	73.8353239	74.8141311	73.3561562	73.3641543

Conclusion
Libjxl 0.8.2 consistently scores higher Butterguli metrics across all quality ranges. For higher quality images, the newer versions of libjxl score the same or higher for pnorm and ssimulacra2. This probably means the gap in quality for high quality nonphotographic images is probably within margin of error. For medium quality images the difference in butteraguli/ssimulacra2 scores increase in favor of 0.8.2. Pnorm and dssim scores compare equally. So the regression for medium quality images is subjective at best. For low quality images there is a complete sweep in favor of 0.8.2. The exception is pnorm which decreases instead. This is where I think a regression is certainly possible.

For the photo it's a similar story. 0.8.2 scores higher Butterguli metrics across all quality ranges. Oddly enough, for high quality tests 0.8.2 performs better under all metrics. For medium quality images, all metrics score in favor of 0.8.2 when compared to 0.10.2 and the git master branch. The exception is 0.9.2 where it keeps up with 0.8.2 in dssim/ssim2. Finally, low quality images is where things start to get interesting. Comparing 0.8.2 to 0.10.2/git master, all metrics point in favor of 0.8.2. However 0.9.2 beats 0.8.2 in every metric by a large margin! (except butteraguli) I wonder if the quality improvement is noticeable under the scrutiny of the human eye (it could be a fluke). Anyways, it doesn't really matter since the difference is close enough where there likely isn't a regression present here.

To wrap it all up, the two different types of images shows us a possible regression in two different qualities. For lossy nonphotographic images, the regression seemed to me most apparent for low quality images around distances 2.5-3.0. For lossy photographic images, metrics favor higher qualities at distances 0.75-1.00 (probably just a fluke however). It seems like the git-master branch has improvements (#3531) (#3529) which bumps it up a notch over 0.10.2 but it still has a ways to go to match 0.8.2. Tweaking and refining of the next version is always happening, like (#3535) and more to come. Therefore the regression might slowly disappear over time. At the end of the day it's just algorithms and these results mean nothing when confronted with the human eye. Thanks for reading!

Signing off,
A very tired galaxy

jyrkialakuijala · 2024-04-29T08:25:34Z

Thank you for doing this!

I always make the quality-related decisions only by my eyes and more or less ignore the metrics.

I'd know how to make the metrics much better but images to look worse.

The process of eye-balling the quality may lead metrics to agree or disagree. I strongly emphasize on worst case behaviour and try to find mitigations for worst case rather than balanced performance -- since I anticipate that each worst case behaviour will cause practitioners to increase the quality settings and leading to broad increase in bytes for all content.

Do you see -- with your own eyes -- degradation between 0.8 and 0.10, or any other ?

Galaxy4594 · 2024-04-29T19:11:04Z

Thanks for the input!

I just use metrics as a sanity check to make sure I'm not just seeing things.

If you want to focus on the worst case scenario, then lets look at images where .avif excels.

For example the _ultraman image.

Settings used are -d 3 for 0.10 and -d 2.6 for 0.8.

There is more noise in the cross hatching around the eye and forehead that shouldn't be there, so I think 0.8 looks better and more natural than 0.10.

Zooming in, there is a weird grey area where there should be color, its very subtle but I realized that it was present in all images I compressed with 0.10. Obviously its much less visible in photos, but we are talking about the worst case scenario. I saw (#3520) and I tried it with the master branch (4da4b9a) but I got the same result.

Here is another image illustrating similar regressions: tf2_wallpaper

Settings used are -d 3 for 0.10 and -d 2.45 for 0.8.

His face, cap, and gun all have noticeably more noise in 0.10. The strap covering his shirt is possibly the worst offender. There is more chroma noise around his neck. And the gun barrels underneath him are blurrier in 0.10.2. Same thing with the wrap around his hand, the lines are somewhat blurred out.

Here is where the smooth, low-contrast gradients get blurred and noise is added in 0.10. It's even more apparent when zoomed in.

The photo I included in my benchmark shows that 0.10 is visually better that 0.8 in my opinion, despite the metrics favoring 0.8. And it clearly outperforms .avif since this is the kind of image jpegxl was tuned for The skin texture is sharper in the .avif but, there is a very clear banding/smoothing on shaded parts of the face, especially the cheeks. So jpegxl aims for total image consistency rather than focusing on points of interest (which is better imo).

I understand that jpegxl is tuned for general use cases, but the additional noise/blurring leads to worse performance in nonphotographic images like the two examples I've shown. Where small changes to sharpness and smooth gradients are very noticeable. So if tuning the algorithms to benefit these kind of images hurts performance in other photos, then maybe something like, --tune-artwork similar to x264 could be implemented.

Galaxy4594 · 2024-05-26T10:13:23Z

Just a quick follow-up: I’m comparing version 0.10.2 with the latest master version (commit 45e688c) to see how things have changed. I retested the images ultraman.png and tf2_wallpaper.png, both at a distance of 3.

Here are the results:

It seems that a small amount of chroma banding is now gone. Additionally, the chroma noise issues have been resolved as well.

Overall, it’s a significant improvement. While there is still some excessive blurring compared to version 0.8.2, the quality has definitely improved.

mo271 added encoder quality Quality tuning of lossy encoding unrelated to 1.0 Things that need not be done before the 1.0 version milestone labels Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression in lossy nonphotographic images #3530

Regression in lossy nonphotographic images #3530

Galaxy4594 commented Apr 25, 2024 •

edited

Galaxy4594 commented Apr 26, 2024 •

edited

jyrkialakuijala commented Apr 29, 2024

Galaxy4594 commented Apr 29, 2024 •

edited

Galaxy4594 commented May 26, 2024

Regression in lossy nonphotographic images #3530

Regression in lossy nonphotographic images #3530

Comments

Galaxy4594 commented Apr 25, 2024 • edited

Galaxy4594 commented Apr 26, 2024 • edited

jyrkialakuijala commented Apr 29, 2024

Galaxy4594 commented Apr 29, 2024 • edited

Galaxy4594 commented May 26, 2024

Galaxy4594 commented Apr 25, 2024 •

edited

Galaxy4594 commented Apr 26, 2024 •

edited

Galaxy4594 commented Apr 29, 2024 •

edited