Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple speed up of Lesson 1 #28

Closed
theIDinside opened this issue Jan 17, 2019 · 3 comments
Closed

Simple speed up of Lesson 1 #28

theIDinside opened this issue Jan 17, 2019 · 3 comments

Comments

@theIDinside
Copy link

First off, amazing content. But changing the line drawing function, so that it does not check if it is steep, within the for loop, almost speeds it up by 2x, with -O3 on. So, with sacrificing for a little extra code bloat, if one changes the for loop, from

    for (int x=x0; x<=x1; x++) { 
        if (steep) { 
            image.set(y, x, color); 
        } else { 
            image.set(x, y, color); 
        } 
        error2 += derror2; 
        if (error2 > dx) { 
            y += (y1>y0?1:-1); 
            error2 -= dx*2; 
        } 
    } 

to instead

    if(steep) {
        for(int x = x0; x<=x1; ++x) {
            img.set_pixel_color(y, x, color);
            error2 += derror2;
            if(error2 > dx) {
                y += (y1>y0? 1 : -1);
                error2 -= dx*2;
            }
        }
    } else {
        for(int x = x0; x<=x1; ++x) {
            img.set_pixel_color(x, y, color);
            error2 += derror2;
            if(error2 > dx) {
                y += (y1>y0? 1 : -1);
                error2 -= dx*2;
            }
        }
    }

We suddenly eliminate one of the biggest bad guys in programming, branching, and especially branching inside for loops. This also leaves even more room for the optimizer to optimize better. Granted, some optimizer (perhaps) are smart enough to see this on it's own, but it really isn't good practice to count on that in this case. Using rudimentary system_clock and measuring in microseconds, drawing the wire frame image, goes from around ~44000us to ~22000 us, on my system. So it's a pretty impressive speedup to eliminate the branch instruction inside a loop.

@ssloy ssloy pinned this issue Jan 17, 2019
@ssloy
Copy link
Owner

ssloy commented Jan 17, 2019

Thank you, this is a good point!

@mtexier
Copy link

mtexier commented Jan 17, 2019

Hi,

You can also do the same for:
(y1>y0? 1 : -1);
By computing increment value at the start of the function:
const int yincr = (y1>y0? 1 : -1);
and then only doing a
y += yincr;
in the loop.

@theIDinside
Copy link
Author

I closed the issue, after trying both gcc and clang-8 as compilers, and the current standards for those, are that they (at any optimization level) will do what (supposedly is called) loop unhoisting, and compilers are smart enough today, to figure out that, if the variable is not a global one (therefore thread safe), it will move it out of the for loop. But it's always a good thing to consider I suppose.

OAguinagalde added a commit to OAguinagalde/tinyrenderer_ that referenced this issue Feb 27, 2022
Somehow my version seems to still win :D gotta test the wireframe test
zfengyan added a commit to zfengyan/Learn_Renderer that referenced this issue May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants