Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Channel3.getSample computes non-integer typed array index, leading to recompilation loop in Firefox #216

Open
anba opened this issue Dec 21, 2018 · 11 comments
Assignees
Labels
bug Something isn't working Core/Wasm Issues concerning the core library of Wasmboy

Comments

@anba
Copy link

anba commented Dec 21, 2018

https://bugzilla.mozilla.org/show_bug.cgi?id=1515620 was filed yesterday against SpiderMonkey, based on the benchmark results at https://medium.com/@torch2424/webassembly-is-fast-a-real-world-benchmark-of-webassembly-vs-es6-d85a23f8e193. When investigating why SpiderMonkey was so bad at this particular benchmark, it turned out that the abysmal performance is caused by a recompilation loop in SpiderMonkey's optimizing compiler when inlining typed array accesses. This load function
https://github.com/torch2424/wasmBoy/blob/693b160ecd9ae084b4459a08a930b2ef9108b255/core/portable/wasmMock.js#L18-L20

is currently called with non-integer inputs, like 59853.5, where the fractional part .5 is caused by this division (*):
https://github.com/torch2424/wasmBoy/blob/693b160ecd9ae084b4459a08a930b2ef9108b255/core/sound/channel3.ts#L185

While the recompilation loop is certainly an issue which should be fixed in SpiderMonkey, the miscomputed typed array index looks like a bug in wasmBoy to me, too.

(*) I haven't checked if there are additional callers to load which also pass in non-integer indices.

@torch2424
Copy link
Owner

torch2424 commented Dec 21, 2018

Oh wow! Super rad the article / project reached you guys at Mozilla! 😄

Ahhh! So to be honest, the Typescript compiled version did have an issue with Sound, and I knew it was because I wasn't sanitizing numbers (I think is the right term?) correctly, but the JS version wasn't meant to be playable as a user, just run the games so I figured it would be fine for now.

But, I'll definitely go through and fix this! I'll also be sure to add this issue to the article! I knew something was funny when I saw Firefox performance, and I was careful not too call out any engine being "better" because of stuff like this.

Thanks for the bug! I'll try to fix this and I'll update the article now. Thanks!

Edit: article updated in multiple places to link the bug, and left an "Edit log" at the bottom of the article 😄

@torch2424 torch2424 self-assigned this Dec 21, 2018
@torch2424 torch2424 added bug Something isn't working Core/Wasm Issues concerning the core library of Wasmboy good first issue Good for newcomers labels Dec 21, 2018
@cosinusoidally
Copy link

Not sure if this is a 100% correct fix, but coercing the load/store offsets to ints significantly improve the benchmark results in Firefox version 64 for me (running the tobu tobu girl game).

--- index.iife.js.orig  2018-12-21 10:03:22.000000000 +0000
+++ index.iife.js       2018-12-21 19:46:58.428069475 +0000
@@ -1735,11 +1735,11 @@
   };
 
   const load = offset => {
-    return wasmByteMemory[offset];
+    return wasmByteMemory[offset|0];
   };
 
   const store = (offset, value) => {
-    wasmByteMemory[offset] = value;
+    wasmByteMemory[offset|0] = value;
   };
 
   const abs = value => {

firefox-64-slow
vs
firefox-64-fast

@torch2424
Copy link
Owner

@cosinusoidally Yeah totally! Thanks for looking into this! Currently, I'm not at my home laptop, so I'll probably open up a PR later tonight to do all the coercing and stuff using the "portable" functions I have to handle all of the type differences between running with AssemblyScript and running with JS.

@cosinusoidally
Copy link

@torch2424 well all credit to @anba for tracking down the issue. I just spotted his comment here https://bugzilla.mozilla.org/show_bug.cgi?id=1515620#c3 and thought "hmm, I wonder if I can just tweak the generated JS to avoid this performance cliff"

@dcodeIO
Copy link
Contributor

dcodeIO commented Dec 21, 2018

Btw, the recommended portable way of writing this with AS is

let positionIndexToAdd = i32(Channel3.waveTablePosition / 2); 

which is a nop in WASM and a |0 in JS. Hope this doesn't result in any issues with the i32 function though :)

@torch2424
Copy link
Owner

So I went ahead and opened #218 Which as you can see definitely fixes the FireFox JS performance! But it still seems like something is off after we closure compile it.

In the meantime, I'll go ahead and push up the fix for anyone who happens to run the benchmark. 😄

torch2424 added a commit that referenced this issue Dec 22, 2018
relates to #216 

This fixes the sound inconsistencies between Wasm and JS, as well as fixes the JS performance of Firefox.

Also, snuck in some changes for faster travis and deploying, for less redundant building of the lib.

**Example of new firefox results for JS core**

<img width="827" alt="screen shot 2018-12-21 at 10 48 00 pm" src="https://user-images.githubusercontent.com/1448289/50371655-0503ef80-0574-11e9-8932-efb0bed39fc6.png">

**Old Travis Build Time**

<img width="961" alt="screen shot 2018-12-21 at 11 37 43 pm" src="https://user-images.githubusercontent.com/1448289/50371952-8742e280-0579-11e9-9fae-1ed8e7a57cc3.png">


**New Travis Build Time**

<img width="960" alt="screen shot 2018-12-21 at 11 37 37 pm" src="https://user-images.githubusercontent.com/1448289/50371953-8f028700-0579-11e9-8756-a89de78e7eaf.png">
@anba
Copy link
Author

anba commented Jan 2, 2019

So I went ahead and opened #218 Which as you can see definitely fixes the FireFox JS performance!

\o/

But it still seems like something is off after we closure compile it.

Hmm, that's strange. The Closure-compiled version is definitely faster than the non-compiled one for me. (Tested in under Windows and Linux, with Release/Beta/Nightly versions of Firefox.)


I did a bit of additional profiling (with the Firefox profiler addon) to see if there are any other easy optimization targets for the JS/TypeScript version. And the profile showed that averageFpsFromTimes takes a considerable amount of time (about 14.25% of the complete runtime is spent there!), which is mostly caused by stringifying numbers in parseFloat → This will be fixed in https://bugzilla.mozilla.org/show_bug.cgi?id=1517235. (Even with that fix averageFpsFromTimes is still kind of slow and takes ~6.25% of the complete runtime, because it's allocating and extending arrays constantly, but I guess that's more of an issue in the "stats-lite" package and not this project.)

The profile also showed that NumberMod (the SpiderMonkey implementation function of the % operator when called with double values) or more specifically Libc's fmod takes about ~4% of the complete runtime. That struck me as odd, especially when I found where the modulo operator was applied:
https://github.com/torch2424/wasmBoy/blob/806cd8236e16f4a8bbb51c158218ec75a8a0c8de/core/graphics/backgroundWindow.ts#L214

In the TypeScript code pixelXPositionInMap is typed as i32, but somehow the optimizing compiler was using a double here. With a bit of debugging I found out that the double value is generated here
https://github.com/torch2424/wasmBoy/blob/806cd8236e16f4a8bbb51c158218ec75a8a0c8de/core/graphics/backgroundWindow.ts#L63-L64

At one point during the benchmark run, windowX is 0 and -1 * 0 is... drumroll 🥁 -0 (negative zero), so a double value and not an int32. From that point on all computations with windowX or any derived values need to typed as doubles in a JavaScript engine.

So, I could be interesting to see if properly converting the value back to an int32 has any effects in Safari or Chrome, because when I ran a simple standalone version of that code in the shell, the negative effects of using double instead of int32 were even worse in JavaScriptCore and V8.

I guess these two results were the most interesting ones from the profile. Oh, and there also seems to be an off-by-one error in https://github.com/torch2424/wasmBoy/blob/806cd8236e16f4a8bbb51c158218ec75a8a0c8de/demo/benchmark/loadrom.js#L60

I noticed when looking if there are any frequent bail-outs from the optimizing compiler (which I didn't see).

@MaxGraey
Copy link
Contributor

MaxGraey commented Jan 3, 2019

for (let i = 0; i <= coreObject.core.byteMemory.length; i++) {
   coreObject.core.byteMemory[i] = 0;
}

better replace to:

coreObject.core.byteMemory.fill(0);

and yaeh <= should be <

@torch2424
Copy link
Owner

@anba

Hmm, that's strange. The Closure-compiled version is definitely faster than the non-compiled one for me. (Tested in under Windows and Linux, with Release/Beta/Nightly versions of Firefox.)

Huh, that is strange! I ran the benchmark, but maybe forgot to recompile the closure version that night? I'll try again once I get the chance 😄

I did a bit of additional profiling (with the Firefox profiler addon) to see if there are any other easy optimization targets for the JS/TypeScript version.

Wow! That's such a deep dive thank you so much! I've been fighting with performance on the emulator itself for a while, so I'm super stoked to see this 🎉 I'll definitely go ahead and implement all of these things once I get the chance. Thank you again! I'm sure that took a lot of digging to find, and I very much appreciate it!

@MaxGraey

Thanks for the tip! I'll definitely go through and do a bunch of refactoring to use .fill 🙃 Thank you for that!

@torch2424 torch2424 pinned this issue Jan 3, 2019
@torch2424
Copy link
Owner

@anba My apologies for falling so far behind on this. If you noticed I've had this issue pinned all this time, and left a lot of references to it in the article.

Just a quick status update,

Regarding the closure compiler thing, yeah it was definitely my fault haha! 🙃

And I'm currently taking a break from my roadmap: #197 , so I'll try and go through and implement the lingering issues here 😄

Thanks again for all the help, it is very much appreciated!

torch2424 added a commit that referenced this issue Feb 21, 2019
torch2424 added a commit that referenced this issue Feb 22, 2019
* Implemented all changes from #230 and #221

* Fixed off by one mentioned in #216

* Wrapped modulo in i32Portable as mentioned in #216

* issue #207, Allowed making closure builds for debugging, and then tried
to match its inlining

* Removed the legacy api

* Updated the package-lock for the branch
@torch2424
Copy link
Owner

So everything here is implemented now. Going to keep this open a little while longer in case any more feedback. And also, as a reminder to update the article.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Core/Wasm Issues concerning the core library of Wasmboy
Projects
None yet
Development

No branches or pull requests

5 participants