GL rendering improvements

D-MAN · July 13, 2019, 10:13pm

All right, Loader.loadBytes() problem seems to be completely resolved now, and I was digging openfl/lime source to solve another performance pitfall and find out why GPU-accelerated rendering is not supported with BitmapData.draw() (in flash, we were composing static content to bitmap to speed up various animations) and how hard would it be to enable it. But that’s not actually the topic of this post. In fact, I quickly realized, there’s much I can suggest to improve current rendering code. Thing is, for the whole time after I’ve switched from flash development to mobile apps ~5years ago, I’ve been writing and constantly improving a cross-platform c++ library, which basically does what openfl/lime does: GPU-accelerated 2D-rendering, which gives best performance and consistent results on all supported platforms. So I believe I have a couple of tricks up my sleeve to make openfl more robust and performant.

Let me give an example. Judging from current BitmapData code, each bitmap data object stores it’s own vertex/index buffers and updates them each time changes were made. What years of GL dev taught me, is that array data flow between RAM and GPU must be brought down to an absolute minimum. On the other hand, uniform shader params a practically free perfmance-wise. So what you should always try to do for best performance is try to keep data as static as possible, and use “smart” dedicated parametric shaders to transform base data into desired output, based on passed parameters.

For example, here’s a vertex data sufficient to render any simple bitmap:
text2995-1-4
Actual rendered bitmap size can be controlled via transformation matrix, and the very same mesh can be used for UV mapping (if you need to draw a part of texture, you can pass UV rect via params without touching vertex data).

Scale-9-grid case is a bit more complex, but still pretty straightforward. Consider this mesh:

It is sufficient to draw any bitmap with or wihout scale9grid, we just need make our shader map base vertex data into desired position/uv coordinates:
text2995-8-2-0

Obviously, position/uv horizontal/vertical mappings are independent and conceptually the same:

(where A, B, C, D are desired transformed coordinates of the mesh)
The graph is a well-known “piecewise linear function” and has well known formula and the solution. Assuming, shader “knows” that vertices are located at (0, 1, 2, 3), transform formula is:
output = a + b * input + c * abs(input - 1) + d * abs(input - 2) // where input is x or y of base mesh vertex
So, all we should do on client is to calculate 4 * 4 = 16 coefficients for a desired mapping and pass them to shader (as mat4 uniform, for example).
Voila! We have a dedicated shader with all the benefits:

same shader for all bitmaps
same shared vertex/index data for all bitmaps
zero array data transfers during animations/resizes/scale9grid changes
zero data allocations
low & constant computational cost of dynamic parameters

I’ve created a short demo to illustrate the concept: https://jsfiddle.net/qgdh31wf/ (lines 176-182 is where position/uv mapping is set)

Similarily, graphics path data (lines) scaling can be improved. It seems that when lineScaleMode != LineScaleMode.NORMAL, vertex data is fully recalculated/upload at each scale change, which results in a terrible performance. This can be avoided with a dedicated shader and vertex data in special format: {x, y, normalAngle, relativeDistance}. Obviously, unlike BitmapData above, each Graphics object will have it’s own vertex buffer, but at least no recalculations/uploads will be necessary as long as path data is static and only scaling is done. I can illustrate the concept further if you’re interested.

These a first two ideas that came to my mind, I may have a ton more.

So… what do you think? Basically, all of the above is a long version of a question “should I go for it?”. Some of improvements may lead to some “major” changes in code, so have some concerns: I may not know some pecularities about GL-enabled platforms which openfl currently supports, I don’t want to interfere with your “vision” of how things should be working internally and, untimately, I don’t want my work to be in vain.

Klug76 · July 15, 2019, 11:22am

I’m not a maintainer, but I think it could be a very good pull request.
In addition, the SubBitmapData and BatchRenderer classes (from https://github.com/innogames/openfl fork) would be very useful.

miltoncandelero · July 15, 2019, 11:27am

Not a mainteiner but a performance freak.
Go for it.

Also, it might be a good idea to take a look at how Tilemap works since that is openfl only batched GPU renderer (aside from drawquads).

Gotta go fast!

singmajesty · July 24, 2019, 8:38pm

I’d be very happy to discuss improvements to the Context3D renderer.

Are you on Discord? https://discord.gg/tDgq8EE

We’re in the process of refactoring and preparing to make updates to the renderer