If you are interested, you can come and take a look at my test results, "as3.0 starling openfl starling createjs" rendering performance

I did some benchmarks of my own today. Now disclaimer here, I’m rather busy at the moment with multiple projects on the go, so I leaned heavily on Claude Sonnet and Qwen Coder to pull these together.

I’m using the latest git versions of all frameworks as of today.

The benchmark borrows the zombies in @Matse’s demo, and on enter frame, inserts all 16 types of zombies repeatedly until the framerate over the last 120 frames, averages below 55fps at which point it stops adding zombies.

I had hoped to include @rainy’s HXMaker framework, but I was having some difficulties with dependencies.

OpenFL “Traditional” is using the traditional Flash/AIR API style display list.
OpenFL “Tilemaps” uses Tilemaps (obviously). The performance here was impressive!
Starling didn’t perform as well as I might have expected, so this was eye opening. I spent quite a bit of time trying to squeeze it, but wasn’t able to improve the result.
Massive, well, @Matse these results are impressive!!!

HTML5 “Gecko” is tested in a Firefox based browser. The performance was consistently less than Chromium.
HTML5 “Chromium” is tested in a Chromium based browser.





1 Like

Massive really stood out here! Amazing work @Matse!

high-high-five

The other thing I noted with @Matse’s Massive framework, is it wasn’t hindered by the display list style batching of OpenFL and Starling. It uses a unified MassiveDisplay container where that traditional hierarchy isn’t present.

Really well done @Matse!

1 Like

Thanks a lot @Bink !

I wanted to do that kind of benchmark, those are very interesting results with regular Starling being a lot more performant than regular OpenFL (even with its current “flaws”, I mean it could run much faster)

I would have been sad if Massive didn’t come on top :smiley: It’s totally expected though : Massive is “cheating” here. I have a new version coming very soon, with skewing, color offsetting, program caching and improved performance

1 Like

Here are the tests I used:

1 Like

nice, I will optimize the test cases for hxmaker and submit a PR later :cat_with_wry_smile:

2 Likes

I submitted a PR to test this example, but the performance may not be very good. But I am happy to test it in order to optimize hxmaker! Please help me test again when there is a chance, thank you!

Updating the latest hxmaker and hxmaker openfl should be able to skip some unnecessary dependency libraries.

If you have any further questions, please let me know.

It’s not surprising that using Sprite as a renderer wastes a lot of performance when processing vertex data. I’ve always wanted to skip it, but I haven’t come up with a good way to improve it yet.

I will find a way to solve it tomorrow! i go to sleep now.

2 Likes

Thanks for sharing this. I’m playing around with a branch that modifies Starling’s VertexData and IndexData to use typed arrays instead of ByteArray, and this should be very helpful for making comparisons.

2 Likes

@joshtynjala

Yes, “ByteArray” is very slow and should have been replaced a long time ago. “hxmaker” uses “Array texture array”

I’m glad to see the improvement in the performance of ‘Starling’

If there are issues and solutions you’re aware of, you might consider creating a pull request, or sharing your proposed solution here (given you’ve had difficulty accessing GitHub).

1 Like

@Bink

When testing ‘creatjs’,
Have you enabled ‘stage gl (web gl)’?

@Bink

No, I don’t have a solution for “starling” performance improvement,
However, I would like to share my discovery that the “MovieClip” of “hxmaker” uses “array to save textures”, and from testing, the “moveclip” performance of “hxmaker” is higher than that of “starling”.

@Bink

@joshtynjala

Does’ starling ‘use’ ByteArray 'to upload textures,
As far as I know, ByteArray is very slow. I was wondering if uploading textures using Array would be much faster?

This is awesome news Josh !! I was expecting having to take care of that myself, thanks a lot !

Let me know if you need any help or have any question

2 Likes

Yes it is.

I’ve pushed my changes to the upload-from-typed-array branch in my GitHub fork of Starling, so that folks can try it out. In this branch, both VertexData and IndexData use typed arrays by default.

You can define starling_force_upload_byte_array to restore the old behavior. I played around with an idea where if the rawData getter is called at run-time, it will switch to ByteArray mode automatically for maximum compatibility with any existing code that might access rawData (which seems to be strongly discouraged by the docs, actually). However, the code ended up being a little messier to support that, and I decided to start with something cleaner for testing.

All unit tests are passing with these changes. They did not need to be modified in any way.

Most targets have a nice increase in performance. Some better than others. On my machine, html5 is about 17% faster. C++ is 37% faster. Interestingly, HashLink (including both HL/JIT and HL/C) has the most dramatic improvement, being able to render 2.39x as many objects in Bink’s benchmarks. My theory is that hxcpp and browsers have much better garbage collectors that they deal with temporary objects better than HashLink’s GC.

I also made a more traditional BunnyMark sample for Starling locally, and HashLink/C is actually rendering more bunnies than hxcpp on that one, which feels weird.

Anyway, I need to work on other things now, but I hope that folks can give it a try and maybe offer further improvements.

For reference, I’ve been testing on Fedora Linux 43. I haven’t yet tried this code on Windows or macOS (or iOS or Android, for that mattter), so I don’t know how dramatic the difference will be there, whether better or worse than what I’m seeing.

2 Likes

That sounds great, thanks !

Definetely gonna switch to that and look into it very soon :slight_smile:

Isn’t Hashlink/C… C ? I remember trying to test that a few years ago but couldn’t manage to get it working.

Yes.

I incorrectly assumed that HL/C would be close to, but maybe a tiny bit slower than, hxcpp. However, I was surprised to see that HL/C was running significantly faster than hxcpp on my Starling version of BunnyMark. And it’s bit faster, but only a little, in Bink’s benchmark.

These days, it should be just as easy as running this command:

lime build hlc
1 Like

I’m kilometers away from my comfort zone here : I thought C was supposed to be faster (because lower level ?)

I think C and C++ are pretty comparable, in terms of performance. It’s likely that some parts of C++ may have additional bottlenecks, but I’d guess core C++ stuff like classes aren’t adding significant overhead, if any.

Both hxcpp and HashLink implement their own small “runtime” on top of C and C++, though. The Haxe standard library, a garbage collector, etc. It was less that I thought that there’d be a difference in the C and C++ languages, and more that I expected these “runtimes” to have different characteristics. hxcpp is used by more people in production than HL/C. I think that HL/C is mostly used by Shiro, but fewer other companies because it wasn’t well documented how to set things up (at least until I implemented the hlc target in Lime). So I figured that hxcpp might be more optimized thanks to more real-world use.

HashLink/C hasn’t really been on my radar… and perhaps it should have been, particularly if it’s what Shiro leverage.

I look forward to giving your Starling branch a go @joshtynjala :smiley: Thanks for putting work into that! I have a busy week this week, so I may not get to it for a bit unfortunately.

1 Like