Core (LV::Video) Fix alpha blending of 32-bit videos (#230) #244

kaixiong · 2023-02-02T21:51:50Z

This is a rewrite of the buggy 32-bit LV::Video alpha blending code to deal with arithmetic underflows/overflows and the use of an uninitialized register in the MMX implementation (#230).

Take note that GCC/Clang x86-64 (recent only?) produces SSE instructions instead of MMX. GCC 12.2 uses the XMM registers while Clang sticks to MM but throwing in the use of pshuflw.

Perhaps it's time to move on and use SSE2 (introduced in 2000 to Pentium 4s) to work with 2 pixels at once. Or maybe even 4 pixels at once. This will require larger memory alignments and complicate the code a bit more to work with non-divisible row widths.

Here is a link to the plain C and SIMD code in Godbolt.

hartwork

@kaixiong I think despite the CI being green I found two bugs in the new version of blit_overlay_alphasrc. Happy to learn I overlooked or misunderstood something, let's see. Interesting stuff!

hartwork · 2023-02-07T01:09:06Z

libvisual/libvisual/private/lv_video_blit.cpp

+ uint16_t const c0 = static_cast<uint16_t> (src_pixel[0]) * src_alpha + static_cast<uint16_t> (dst_pixel[0]) * (255 - src_alpha);
+ uint16_t const c1 = static_cast<uint16_t> (src_pixel[1]) * src_alpha + static_cast<uint16_t> (dst_pixel[1]) * (255 - src_alpha);
+ uint16_t const c2 = static_cast<uint16_t> (src_pixel[2]) * src_alpha + static_cast<uint16_t> (dst_pixel[2]) * (255 - src_alpha);

- destbuf[0] = (alpha * (srcbuf[0] - destbuf[0]) >> 8) + destbuf[0];
- destbuf[1] = (alpha * (srcbuf[1] - destbuf[1]) >> 8) + destbuf[1];
- destbuf[2] = (alpha * (srcbuf[2] - destbuf[2]) >> 8) + destbuf[2];
+ dst_pixel[0] = c0 >> 8;
+ dst_pixel[1] = c1 >> 8;
+ dst_pixel[2] = c2 >> 8;


Hi @kaixiong ,

the TL;DR version would be that I think that:

The >> 8 in dst_pixel[0] = c0 >> 8; and siblings is too much

~~The cast to 16 bits needs to happen elsewhere, i.e. not cast(a * b + cast(c * d)) but cast(a * b) + cast(c * d).~~ EDIT: mis-read the code, nevermind

Let me go into more detail on former:

My impression is that this tries to implement approach "A over B" at https://en.wikipedia.org/wiki/Alpha_compositing#Description , formalized there as:

A over B a_0 := a_a + a_b * (1.0 - a_a) C_0 := (C_a * a_a + C_b * a_b * (1.0 - a_a)) / a_0

Whereas for us:

A := src B := dst

With a_a and a_b ranging from 0 to 255 rather than 0.0 to 1.0 I end up with these formulas for us, note the output range annotation.

a_0 := a_a/255 + a_b/255 * (1.0 - a_a/255)) # range 0..1 == a_a/255 + a_b/255 * (255/255 - a_a/255)) == (a_a + a_b * (255/255 - a_a/255)))/ 255 == (a_a + a_b * (255 - a_a) / 255) / 255 # range 0..1 => (a_a + a_b * (255 - a_a) / 255) # range 0..255 C_0 := (C_a * a_a/255 + C_b * a_b/255 * (1.0 - a_a/255)) / (a_0/255) # range 0..1 == (C_a * a_a/255 + C_b * a_b/255 * (1.0 - a_a/255)) / a_0 * 255 == (C_a * a_a/255 + C_b * a_b/255 * (255/255 - a_a/255)) / a_0 * 255 == (C_a * a_a/255 + C_b * a_b/255 * ((255 - a_a) / 255)) / a_0 * 255 # range 0..1 => (C_a * a_a + C_b * a_b * ((255 - a_a) / 255)) / a_0 * 255 # range 0..255

Now the current code in the pull request seems to assume that the dst image has an alpha channel with full 255 opacity a_b for all pixels. I'd like to understand why and would like to suggest addition of a comment to the code, but I'll take it for granted below for a moment.

So Inserting a_b := 255 I get …

a_0 := (a_a + a_b * (255 - a_a) / 255) == (a_a + 255 * (255 - a_a) / 255) == (a_a + (255 - a_a)) == 255 C_0 := (C_a * a_a + C_b * a_b * ((255 - a_a) / 255)) / a_0 * 255 == (C_a * a_a + C_b * 255 * ((255 - a_a) / 255)) / 255 * 255 == (C_a * a_a + C_b * ((255 - a_a) )) == (src_pixel[0] * src_alpha + dst_pixel[0] * ((255 - src_alpha)))

Due to a_0 == 255 then dst_pixel[3] does not need to be written, confirmed.
However, C_0 is already in range 0..255 so the additional division in dst_pixel[0] = c0 >> 8; seems to be to much and should yield in an "almost black" picture, if I am not mistaken.

I'm happy to learn what I missed or to take this to a call, e.g. if my side here was hard to understand.

Best, Sebastian

@hartwork, just to be clear this is only meant to be a fix of the original code. The operation here doesn't actually implement 'over' (or 'atop'), it's a simple linear interpolation between the source and target colours, using the source alpha as parameter. The target alpha is unchanged.

At some point, I want to have alpha compositing with the complete set of operators but that'll have to come later. There are additional API design considerations in there due to the necessity of premultiplied alpha, which the Wikipedia article mentions.

Regarding the actual calculation itself, the right-shift by 8 bits is necessary. Alpha is a percentage i.e. value in [0.0, 1.0] that's been mapped to [0, 255]. So what we're calculating here is really:

c = c1 * alpha/255 + c2 * (1 - alpha/255)

Multiplying throughout by 255 to work with only integers, we have:

c * 255 = c1 * alpha + c2 * (255 - alpha)

To get back c, we divide by 255 in the final step. Here >> 8 was originally chosen for performance reasons (as the test mentions), so I kept it as it is.

Since c1, c2 and alpha each have a maximum value of 255 and their products are involved, we need to work in at least uint16_t.

@kaixiong thanks for elaborating, let me digest that more.

…g of 32-bit videos.

…ntrinsics (#230).

…tr().

kaixiong · 2024-02-08T13:34:14Z

@hartwork, any chance you could look at this again?

hartwork · 2024-02-14T18:43:25Z

@hartwork, any chance you could look at this again?

@kaixiong I hope to find time to, in the coming days

kaixiong self-assigned this Feb 2, 2023

kaixiong added bug critical labels Feb 2, 2023

kaixiong added this to the 0.5.0_alpha1 milestone Feb 2, 2023

kaixiong force-pushed the alpha-blend-fixes branch from 1ec968b to 61bdede Compare February 5, 2023 02:14

kaixiong marked this pull request as ready for review February 5, 2023 02:26

kaixiong requested a review from hartwork February 5, 2023 02:42

hartwork reviewed Feb 7, 2023

View reviewed changes

kaixiong force-pushed the alpha-blend-fixes branch from 61bdede to 45bae1d Compare February 11, 2023 11:57

kaixiong force-pushed the alpha-blend-fixes branch from 45bae1d to 156039a Compare March 31, 2023 23:36

kaixiong added 6 commits February 8, 2024 21:03

Core (LV::Video): Account for underflow/overflow in the alpha blendin…

f8bdd9e

…g of 32-bit videos.

Core (LV::Video): Rewrite MMX alpha blending of 32-bit videos using i…

937abfa

…ntrinsics (#230).

Core (LV::Video): Use the correct source alpha.

4287b56

Core (Tests): Add test for LV::VideoBlit::blit_overlay_alphasrc().

9c1c243

Core (Tests): Fix wrong argument order in calls to Video::get_pixel_p…

f8016fe

…tr().

Core (Tests): Fix building of tests.

35ba3fb

kaixiong force-pushed the alpha-blend-fixes branch from 156039a to 35ba3fb Compare February 8, 2024 13:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core (LV::Video) Fix alpha blending of 32-bit videos (#230) #244

Core (LV::Video) Fix alpha blending of 32-bit videos (#230) #244

kaixiong commented Feb 2, 2023

hartwork left a comment •

edited

hartwork Feb 7, 2023 •

edited

kaixiong Feb 7, 2023 •

edited

hartwork Feb 7, 2023

kaixiong commented Feb 8, 2024

hartwork commented Feb 14, 2024

Core (LV::Video) Fix alpha blending of 32-bit videos (#230) #244

Are you sure you want to change the base?

Core (LV::Video) Fix alpha blending of 32-bit videos (#230) #244

Conversation

kaixiong commented Feb 2, 2023

hartwork left a comment • edited

Choose a reason for hiding this comment

hartwork Feb 7, 2023 • edited

Choose a reason for hiding this comment

kaixiong Feb 7, 2023 • edited

Choose a reason for hiding this comment

hartwork Feb 7, 2023

Choose a reason for hiding this comment

kaixiong commented Feb 8, 2024

hartwork commented Feb 14, 2024

hartwork left a comment •

edited

hartwork Feb 7, 2023 •

edited

kaixiong Feb 7, 2023 •

edited