Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added TextureArraySpriteBatch #5914

Open
wants to merge 35 commits into
base: master
Choose a base branch
from
Open

Conversation

VaTTeRGeR
Copy link

@VaTTeRGeR VaTTeRGeR commented Feb 14, 2020

This optimized SpriteBatch combines draw calls with different textures and employs the LFU replacement strategy to minimize cache thrashing. It can be used as a faster alternative for SpriteBatch in a use case where multiple and/or alternating Textures are involved.

This gives game developers the freedom to:

  • have a larger amount of different assets on screen at once
  • not have to pack everything into one atlas page for performance
  • not have to strictly sort by Texture for performance

It implements the Batch interface and is compatible with the rest of the ecosystem. Cache swapping and load statistics are accessible through getTextureLFUSwaps/Size/Capacity functions.

User supplied Shaders need to take the slightly different set of attributes (texture index) and uniforms (texture array) into account, this is documented though. Apart from that, SpriteBatch and TextureArraySpriteBatch can be used interchangeably.

See #5907

@obigu
Copy link
Contributor

obigu commented Feb 15, 2020

I've tested it on a full game on Desktop, Android and iOS (MobiVM) and everything seems to be working fine. Haven't made any specific profiling, just checked the project behaved as expected on my devices.

@VaTTeRGeR
Copy link
Author

Thanks! Good to see it works on iOS too.

Copy link
Contributor

@WinterAlexander WinterAlexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Do you have any idea when this will be available? For now I'm copying it in my project, if that causes no issue

@VaTTeRGeR
Copy link
Author

Thanks! I don't currently know when/if it's going to get accepted, haven't gotten any more feedback over the last few weeks.

Next thing I'll do is make the constants you mentioned consistent, otherwise it's already as clean as can be, so i don't know why the pull request has stalled.

@WinterAlexander
Copy link
Contributor

WinterAlexander commented Mar 24, 2020

Is there guaranteed from the OpenGL standard that we will never hit maxTextureUnits < 8? It is specified that GL_MAX_TEXTURE_IMAGE_UNITS is always at least 8, but could there be implementation of OpenGL simply not supporting the texture arrays? I know you throw an IllegalStateException in your code when it reaches 0. Has this ever happened? Would be great to find confirmation somewhere that arrays of 8 textures are always supported. Documentation about this is quite succinct from what I've seen.

If not, that's really a pain because that means I need to rewrite each of my shaders with both the array and the single texture version and switch from one to the other if it doesn't work. Throw in dynamic lighting into the mix and that's a huge pain. It's true that TextureArraySpriteBatch is a drop in replacement but only if you are always using the default shader.

Also, the code you wrote to find the smallest supported number of texture units on any machine by repeatedly creating the shader is great but not really useful if we want to make our own shaders. Perhaps this code should be moved to a getMaxTextureUnits() util method which would perform the shader creation to get the actual max numver of texture units?

@VaTTeRGeR
Copy link
Author

It is specified that GL_MAX_TEXTURE_IMAGE_UNITS is always at least 8, but could there be implementation of OpenGL simply not supporting the texture arrays? I know you throw an IllegalStateException in your code when it reaches 0. Has this ever happened? Would be great to find confirmation somewhere that arrays of 8 textures are always supported. Documentation about this is quite succinct from what I've seen.

Yup, there's a good reason that's in there. On Intel HD 3000 (and probably a few other Intel GPUs) the reported number of supported texture units is higher than the actual number, as i found out painfully during testing. The reason i made this an iterative search was that it allows you to find the maximum number of supported texture units instead of just falling back to the minimum number. It also allows you to see if it's unsupported, which is the case if it reaches zero.

Also, the code you wrote to find the smallest supported number of texture units on any machine by repeatedly creating the shader is great but not really useful if we want to make our own shaders. Perhaps this code should be moved to a getMaxTextureUnits() util method which would perform the shader creation to get the actual max numver of texture units?

That is a good idea, i have an improved version ready now that addresses all the points you brought up. There is now a static getMaxTextureUnits() method that will make creating a custom Shader easier. The value is cached after the first call so it isn't expensive either.

@Quillraven
Copy link
Contributor

Any update on this PR? When will it be available in SNAPSHOT?

Fixed AMD related bug where texture-array indexing extension being unsupported did not raise an compile error. This improves the detection of unsupported GPUs to safely fall back to a normal SpriteBatch.
@VaTTeRGeR
Copy link
Author

VaTTeRGeR commented Apr 14, 2020

I dunno, maybe it's too untested for mainline LibGDX.

I just found+fixed a bug on Desktop AMD btw (Adrenaline 20 + RX 570) where the texture array extension was not reported as missing even though it wasn't supported. Fix was to force a specific GLSL version depending on whether GL2 or GL3 was used so that a compiler error gets thrown:

For reference: https://community.amd.com/thread/250403

Long term plan for that is to fall back to an if/else-ladder or a for-loop.

@crykn
Copy link
Member

crykn commented Aug 31, 2020

@xGnoSiSx @TCreutzenberg did you try the batch in your projects and got the expected performance improvements?

@crykn crykn added enhancement graphics Concerning Gdx.graphics labels Aug 31, 2020
@TCreutzenberg
Copy link
Contributor

TCreutzenberg commented Sep 2, 2020

@crykn I didn't do any profiling but I can tell you what my debug logging and observations shows me.

With the standard SpriteBatch the game runs with a steady 60 fps. About once every 2 seconds I have a "slow" frame which takes about 20 - 24ms (instead of the expected 16 - 17ms)
[It still adds up to 60fps even with the "slow" frame]

With the TextureArraySpriteBatch everything also runs steady with 60 fps. I get the "slow" frames less often, about one every five seconds, but the "slow" frame takes longer than before. It's about 40 - 45ms.
I have no clue or idea why this happens.
For me the longer "slow" frame results in a much more noticable lag/stutter when there are fast moving objects around, so it is less desirable for me at the moment.

For now I will stay with the SpriteBatch and will try the TextureArraySpriteBatch again when I'm further into development to see how things behave then.

@VaTTeRGeR
Copy link
Author

VaTTeRGeR commented Sep 2, 2020

@TCreutzenberg Hmm, seems like you have some serious underlying problems, you should build yourself a little profiler that measures the execution times of your subsystems and simultaneously tracks big garbage collection events via Runtime.getXXX(), you would be suprised at how useful this is vs just measuring the overall frametime.

Also try the new ShenandoahGC. It reduces GC Pauses drastically at the cost of some 10% of average throughput.

@TCreutzenberg
Copy link
Contributor

Yes, most likely I have issues as I didn't do any profiling yet. I just don't understand the level of consistency of the difference of the lags for both batches.
I guess this will behave different for different projects and I'll definitely come back and will try it again later in my development cycle 😁

@TCreutzenberg
Copy link
Contributor

Btw, I found the problem of my "long frames": I was using the Lwjgl2 backend for desktop and not the Lwjgl3 one. now it works fine. Seems it had nothing to do with garbage collection or other performance stuff...

@jhoukem
Copy link

jhoukem commented Dec 15, 2020

@VaTTeRGeR Awesome work! My game went from ~100 render call per frame to 9 max so far!
This is ideal in scenarios where all your sprites cannot fit into a single spritesheet (LPC spritesheet typically) and you use Y ordering (so you cannot groups the draw calls of entities with the sames textures).
Even more useful when a single entity require multiples textures to be drawn : ie dynamic equipment system with texture applied on top of each others.

My only issue is regarding box2d light which doesn't work anymore see the stack trace:

Exception in thread "LWJGL Application" org.lwjgl.opengl.OpenGLException: Cannot use Buffers when Array Buffer Object is enabled
	at org.lwjgl.opengl.GLChecks.ensureArrayVBOdisabled(GLChecks.java:71)
	at org.lwjgl.opengl.GL20.glVertexAttribPointer(GL20.java:855)
	at com.badlogic.gdx.backends.lwjgl.LwjglGL20.glVertexAttribPointer(LwjglGL20.java:829)
	at com.badlogic.gdx.graphics.glutils.ShaderProgram.setVertexAttribute(ShaderProgram.java:658)
	at com.badlogic.gdx.graphics.glutils.VertexArray.bind(VertexArray.java:115)
	at com.badlogic.gdx.graphics.Mesh.bind(Mesh.java:511)
	at com.badlogic.gdx.graphics.Mesh.bind(Mesh.java:502)
	at com.badlogic.gdx.graphics.Mesh.render(Mesh.java:612)
	at com.badlogic.gdx.graphics.Mesh.render(Mesh.java:582)
	at box2dLight.PositionalLight.render(PositionalLight.java:87)
	at box2dLight.RayHandler.render(RayHandler.java:323)
	at box2dLight.RayHandler.updateAndRender(RayHandler.java:269)

But I guess it is expected since it was not implemented with this in mind.

I haven't much time to look at the code yet but this look very promising so far. Good job again!
What is the state of this PR ? Is there any plan for it to be merged ?

@VaTTeRGeR
Copy link
Author

@jhoukem Thanks! Does the error still appear if you use the normal SpriteBatch for Box2dLights and the modified one for everything else?

I have no idea if/when this is getting integrated, there could be better ways to implement this, but LibGDX seems to be stuck at the current OpenGL version because of the whole multi-platform thing. They might never integrate this, since it's not a perfectly clean solution and depends on GLES3.0 being enabled...

@jhoukem
Copy link

jhoukem commented Dec 15, 2020

Box2d light does not use any spritebatch instead every lights renders in its own way using a mesh directly so I guess it expect the GL context to be in a specific state which it is not thus the failure.
When you say it is not a perfectly clean solution what do you refers to exaclty ? The GLES2.0 support ? Something else ?

@VaTTeRGeR
Copy link
Author

  • Requires GLES 3.0, not problematic in this day and age but meh
  • Shaders are different (layer index, texture access, etc)
  • A whole separate class with new options for users to understand
  • hefty performance penalty when exceeding the texture cache size (texture copying)
  • Resulting incompatibilities with other LibGDX stuff

This is basically a special purpose class. I'm glad i made it, since it enables multi-atlas sprite rendering to be fast despite using LibGDX, but it's not exactly in the core spirit of LibGDX.

I'll see if i can do something about the messed up OpenGL context, this might be the solution: http://forum.lwjgl.org/index.php?topic=3322.0

@tommyettinger
Copy link
Member

Some talk relating to this recently surfaced on #5907 , all positive! I can fix the few conflicts if no one objects.

tommyettinger and others added 4 commits March 24, 2024 18:39
Without using the current .gitignore , ignored native files used by RoboVM and gdx-lwjgl3-glfw-awt-macos could be re-added by this PR, which we don't want.
This also gets rid of one tiny conflict, an extra newline, in CONTRIBUTORS .
@NathanSweet
Copy link
Member

@tommyettinger Oops, I didn't see this PR and tested this code. The lightly cleaned up code I used is here.

@tommyettinger
Copy link
Member

My only concern at this point is that there are no tests for this class yet. I think I can mostly just paste the new Batch into an existing project that uses GL30, and test that way. I just don't have any projects that currently use GL30... I'll probably check TextraTypist first, since I know it uses multiple Textures pretty often. It seems like people have been testing on their own like this, and the code seems to work, so this probably isn't strictly necessary.

@NathanSweet
Copy link
Member

I haven't had time to look at what is different in this PR versus what I tested, but FWIW Spine and the class I tested don't use GL30.

tommyettinger added a commit to tommyettinger/textratypist that referenced this pull request Mar 26, 2024
This is related to libgdx/libgdx#5914 . I've found that the framerate is significantly higher with ArrayTextureSpriteBatch when compared to SpriteBatch, but both are lower than just using SpriteBatch on GL20. GL20 SpriteBatch is almost twice as fast as GL30 ArrayTextureSpriteBatch, which seems to suggest something's not right in GL30 or my limited usage of it.
@tommyettinger
Copy link
Member

tommyettinger commented Mar 26, 2024

I tested out the GL20 one; it was quite a bit more work than it should have been because the syntax highlighting on your "lightly cleaned up code" also removes most newlines, which turns out to actually be a problem for Java sources! 😮 You can see the mangled result in the comments here in my copy; I manually fixed the several-dozen cases where a // comment wiped out code. My copy does compile, though! It just isn't much different performance-wise from SpriteBatch, even though I think my code was swapping textures a fair amount... The GLProfiler stats were quite different between all three, which is odd.

SpriteBatch (GL20):
Calls: 403, draw calls: 28, shader switches: 1, texture bindings: 28

ArrayTextureSpriteBatch (GL30):
Calls: 110, draw calls: 8, shader switches: 1, texture bindings: 1

TextureArraySpriteBatch (GL20):
Calls: 249, draw calls: 8, shader switches: 1, texture bindings: 41

It seems like ArrayTextureSpriteBatch should be faster, with fewer calls and draw calls, only one texture binding, etc., but it's the slowest of the three. The others seem about tied... Code!

EDIT: 3 minutes after posting this I realized the links were wrong.

Using SpriteBatch.
Using ArrayTextureSpriteBatch.
Using TextureArraySpriteBatch.

@NathanSweet
Copy link
Member

@tommyettinger Sorry my fancy website formatting made it harder! I should have posted the raw text.

Seems in your test you had 28 draw calls with SpriteBatch. That's already not many, so it's not too surprising you didn't see much benefit. My TextureArraySpriteBatch test saw a difference going from 293k draw calls to 413 and 18k draw calls to 106.

Ultimately I decided to break my dashed lines into segments and draw the texture region multiple times, with the last segment less than full length. The main reason for that was Spine supports any Batch, PolygonSpriteBatch, or a TwoColorPolygonBatch and I didn't want another set of batches that do that stuff plus what TextureArraySpriteBatch does.

@tommyettinger
Copy link
Member

Woah, less than 1% of the draw calls! I like the version that's compatible with GL20 best at this point, since ANGLE and I think several other places are limited to GL20. If TextureArraySpriteBatch is compatible with GL20 and GL30 (and up), could it be used to replace SpriteBatch? Only if it's backwards-compatible, I mean. I haven't used PolygonSpriteBatch much at all, and I don't know if it would need changing also.

@NathanSweet
Copy link
Member

Yeah it can reduce draw calls a lot, which is cool, though so far I've always been able to arrange things differently to get reasonable draw calls. There are surely scenarios where it's crucial and workarounds are painful or impossible, as jhoukem mentioned:

where all your sprites cannot fit into a single spritesheet (LPC spritesheet typically) and you use Y ordering (so you cannot groups the draw calls of entities with the sames textures).

TextureArraySpriteBatch is GL20 compatible and a drop in replacement for SpriteBatch. A similar replacement for PolygonSpriteBatch would require another copy/paste of the whole class.

AFAIK ArrayTextureSpriteBatch needs GL30 but may have some advantages.

@obigu
Copy link
Contributor

obigu commented Apr 15, 2024

I see I tested this myself on my projects 4 years ago but just checked nothing crashed and after seeing @NathanSweet experience, even if I'd be happy to merge as it i, I think this PR would benefit a lot from having a test. The test would allow to easily check the performance benefits as well as perform quick testing on diffferent platforms and give us some extra confidence there are no major drawbacks/bugs.

@obigu
Copy link
Contributor

obigu commented Apr 15, 2024

Just for reference, this might need to be ported to this one if it gets merged before #7346

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement graphics Concerning Gdx.graphics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet