OpenFL 4 sprites and / or Haxe 3.3 slow

Today I updated Haxe to 3.3 and OpenFL and Lime to latest versions (from haxelib). And something strange happens…

If I add more than 100 sprites - everything is very very slow. Example - https://gist.github.com/restorer/b7d5479d129b2bbce9d037afdc14fd52 (it’s BunnyMark from samples, but with sprites instead of Tilemap / Tileset, 1000 sprites).

# OpenFL 4.1.0, Lime 3.1.0, Haxe 3.3.0-rc.1

target       | debug?  | result
-------------+---------+----------------------------
flash        | debug   | everything fine (60 fps)
flash        | release | everything fine (60 fps)
neko         | debug   | very slow (8 fps)
neko         | release | very slow (8 fps)
cpp          | debug   | not slow, not fast (30 fps)
cpp          | release | everything fine (60 fps)
html5 dom    | debug   | slow (20 fps)
html5 dom    | release | slow (20 fps)
html5 webgl  | debug   | super slow (1fps)
html5 webgl  | release | everything fine (60 fps)
html5 canvas | debug   | everything fine (60 fps)
html5 canvas | release | everything fine (60 fps)

# OpenFL 3.6.0, Lime 2.9.0, Haxe 3.3.0-rc.1

target       | debug?  | result
-------------+---------+--------------------------------
flash        | debug   | compile error (due to Haxe 3.3)
flash        | release | compile error (due to Haxe 3.3)
neko         | debug   | normal (40 fps)
neko         | release | normal (40 fps)
cpp          | debug   | compile error (due to Haxe 3.3)
cpp          | release | compile error (due to Haxe 3.3)
html5 dom    | debug   | slow (20 fps)
html5 dom    | release | slow (20 fps)
html5 webgl  | debug   | everything fine (60 fps)
html5 webgl  | release | everything fine (60 fps)
html5 canvas | debug   | everything fine (60 fps)
html5 canvas | release | everything fine (60 fps)

I have two main questions:

  1. Why OpenFL 4 is so much slower on some targets than OpenFL 3?
  2. Why -debug is so much slower than release on some targets in OpenFL 4?

Flash should perform the same.

If you were testing Neko with -Dlegacy, more of the code is in C++, so less code is executing in the Neko virtual machine. This makes a difference in performance.

C++ performs better, but bear in mind that a -debug build will enable Haxe debugging for much more of the code (since nearly the whole stack is exposed to Haxe debugging rather than only the surface from before)

HTML5 with WebGL uses a webgl-debug.js context for catching WebGL errors, perhaps this is the difference?

That’s make sense, but in practice sprites in OpenFL 3 faster than sprites in OpenFL 4 with the same target, the same Haxe version, the same hxcpp and the same arguments.

Tests below.

BunnyMark sprites mod, 1000 sprites

Source - https://gist.github.com/restorer/b7d5479d129b2bbce9d037afdc14fd52

Haxe 3.2.1
hxcpp 3.2.205
openfl 3.6.0 / lime 2.9.0
openfl 4.1.0 / lime 3.1.0

| target | arguments           | 3.6.0  | 4.1.0  | 4.1.0 performance (3.6.0 as 100%) |
|--------|---------------------|--------|--------|-----------------------------------|
| flash  |                     | 60 FPS | 60 FPS | 100%                              |
| flash  | -debug              | 60 FPS | 60 FPS | 100%                              |
| neko   |                     | 40 FPS | 8 FPS  | 20%                               |
| neko   | -debug              | 40 FPS | 8 FPS  | 20%                               |
| neko   | -Dlegacy        (1) | 60 FPS | 8 FPS  | 13%                               |
| neko   | -Dlegacy -debug (1) | 60 FPS | 8 FPS  | 13%                               |
| cpp    |                     | 60 FPS | 60 FPS | 100%                              |
| cpp    | -debug              | 60 FPS | 45 FPS | 75%                               |
| cpp    | -Dlegacy        (1) | 60 FPS | 60 FPS | 100%                              |
| cpp    | -Dlegacy -debug (1) | 60 FPS | 45 FPS | 75%                               |
| html5  | -Ddom               | 20 FPS | 20 FPS | 100%                              |
| html5  | -Ddom -debug        | 20 FPS | 20 FPS | 100%                              |
| html5  | -Dcanvas            | 60 FPS | 60 FPS | 100%                              |
| html5  | -Dcanvas -debug     | 60 FPS | 60 FPS | 100%                              |
| html5  | -Dwebgl             | 60 FPS | 60 FPS | 100%                              |
| html5  | -Dwebgl -debug  (2) | 60 FPS | 1 FPS  | 2%                                |

(1) I know that OpenFL 4 doesn’t have legacy mode
(2) webgl-debug.js is added both in 3.6.0 and 4.1.0

Tilemap vs drawTiles, 5000 bunnies

Just for case I tested Tilemap vs drawTiles.

OpenFL 3 - BunnyMark with TilesheetTest from openfl-samples 3.3.0
OpenFL 4 - BunnyMark from openfl-samples 4.0.0

Haxe 3.2.1
hxcpp 3.2.205
openfl 3.6.0 / lime 2.9.0
openfl 4.1.0 / lime 3.1.0

| target | arguments           | 3.6.0  | 4.1.0  | 4.1.0 performance (3.6.0 as 100%) |
|--------|---------------------|--------|--------|-----------------------------------|
| neko   |                     | 10 FPS | 15 FPS | 150%                              |
| neko   | -debug              | 10 FPS | 15 FPS | 150%                              |
| neko   | -Dlegacy        (1) | 30 FPS | 15 FPS | 50%                               |
| neko   | -Dlegacy -debug (1) | 30 FPS | 15 FPS | 50%                               |
| cpp    |                     | 30 FPS | 60 FPS | 200%                              |
| cpp    | -debug              | 30 FPS | 60 FPS | 200%                              |
| cpp    | -Dlegacy        (1) | 30 FPS | 60 FPS | 200%                              |
| cpp    | -Dlegacy -debug (1) | 30 FPS | 60 FPS | 200%                              |
| html5  | -Ddom               | 20 FPS | 20 FPS | 100%                              |
| html5  | -Ddom -debug        | 20 FPS | 20 FPS | 100%                              |
| html5  | -Dcanvas            | 20 FPS | 20 FPS | 100%                              |
| html5  | -Dcanvas -debug     | 20 FPS | 20 FPS | 100%                              |
| html5  | -Dwebgl             | 60 FPS | 60 FPS | 100%                              |
| html5  | -Dwebgl -debug      | 55 FPS | 60 FPS | 110%                              |

(1) I know that OpenFL 4 doesn’t have legacy mode

In many cases OpenFL 4 is faster than OpenFL 3, and -Dwebgl -debug works well (compared to sprites).

OpenFL 4 has a new GL renderer, this is what enables 1.) many bug fixes and 2.) support for WebGL by default on HTML5.

One difference between the two renderers, though, is the older had an optimization to automatically combine multiple Bitmap renders into one batch call if they shared the same BitmapData. In real-world, I think that the chance of someone using the same BitmapData repeatedly is low, and was not worth the extra complication to the render code.

I think we could bring an optimization like that back in, but in the meantime, Tilemap is a better match for thousands of identical instances :wink:

If you use two different images, and add them as every other sprite (A then B then A then B) then you will see 3.6.0 sprite performance drop as well, perhaps lower

My game drops to <10FPS when not using “-Dcanvas”. Is my issue related?

Can I run “openfl test html5” without debug flag set to test if that changes anything?

The “webgl-debug.js” wrapper was only running on -debug builds, but testing Pirate Pig, it had a big effect on performance.

I’m putting it behind -Dwebgl-debug from now on so you guys can have better debug builds :slight_smile:

Yeah, probably it is related to batching, will test it in a few days.

Agree, but currently Tilemap doesn’t support color modifications or blend modes. With sprites I can emulate this behaviour at least for flash target (and probably for *GL targets with filters)