OpenFl performance abysmal on Mac C++ target

Before, I was using OpenFl only for developing games (luckily none was released yet) and never yet tried exporting for other platform than macOS, so I was living with poor performance as a thing that just exists on my older Mac. I only come to this on my current project using HaxeUI with relatively lot of sprites (around 150 visible at time and around 500 in total in memory), when idle UI was firing computer fans as if there was no tomorrow.

But as the problem isn’t related to HaxeUI, I’m including example testing code:

package;

import openfl.display.Sprite;
import openfl.events.Event;
import openfl.events.MouseEvent;

class Main extends Sprite {
	
	public function new () {
		
		super ();
		
		var callback = function(evt:Event){

		}

		for (i in 0...1000){

			var sprite = new Sprite();
    		sprite.addEventListener(MouseEvent.CLICK, callback);
    		
    		sprite.graphics.beginFill(0xFF00FF00);
    		sprite.graphics.drawRect(0, 0, 50, 50);
    		sprite.graphics.endFill();

    		addChild(sprite);
		}
	}
}

This thing compiled and running on idle takes about 3% CPU on Windows target (i5-6600K, Intel HD 530) and full blasting 100% CPU on Mac target (i5-2415M, Intel HD 3000). I know, there is performance difference between these two computers, but 3% vs 100% CPU seems ridiculous.

I’m using latest git versions of OpenFl and Lime.

Anyone can test this? Am I doing something wrong?

In the current renderer, your test case creates 1000 OpenGL textures of 50 x 50 pixels each, and performs 1000 draw calls to bind each texture, and render.

What’s missing for a simple case like this is making simple rectangles using an OpenGL call, without using a texture. There are also opportunities for batching, like using a Tilemap to reduce the number of calls

If you use cacheAsBitmap the parent sprite could be flattened into one bitmap, then the software pass would be done, it would be one texture and one draw call to render.

We’re also working on optimizations for not flipping the screen if nothing has changed, which should help a lot for an example like this where the objects aren’t moving

Thanks a lot, I’ll look into haxeui-openfl backend and try to optimise what I can.

But still, I just tested that same exact code without modifications on Win 10 in VM on my Mac and application takes about 40% CPU in VM on Windows. In other words, I’m getting more than two times better performance on Win in virtual machine (1 core, 2GB RAM), than in macOS as normal os on the same computer (2 cores, 8GB RAM), which I just refuse to accept :smiley: Are you sure that default compiler settings are same on both platforms in these matters, or that something isn’t done differently on macOS, which could affects this?

In this test case, I use project generated by OpenFl (without modifications in terms of compiler flags), Haxe code is same on both platforms (as in previous post):

<?xml version="1.0" encoding="utf-8"?>
<project>
	
	<meta title="Test" package="com.sample.test" version="1.0.0" company="Company Name" />
	<app main="Main" path="Export" file="Test" />
	
	<source path="Source" />
	
	<haxelib name="openfl" />
	
	<assets path="Assets" rename="assets" />
	
</project>

and compiling with

openfl test mac

and

openfl test windows

EDIT: I just tested it on same machine on Win 7 in VM (Win 10 alone runs slowly in VirtualBox, Win 7 is better), so on Win 7 in VM it’s around 15% CPU (2 cores, 2 GB RAM while macOS running under it) as opposed to 100% CPU (2 cores, 8 GB RAM) on macOS. So I note that something is very wrong here. I have MBP 13 early 2011 and running Sierra, if that helps. I’ll later test on MBP 13 mid 2012 just in case.

Perhaps it’s failing to run in OpenGL mode. Can you trace stage.window.renderer.type on the Mac build?

Doesn’t seem like that:

Main.hx:683: OPENGL

I’m calling trace(Lib.current.stage.window.renderer.type); from MouseEvent, so GUI is already rendered at that point. If you need any specific testing from me, I can do it, I just don’t know where to start looking for a problem.

What does the performance look like of you do openfl test mac -Dcairo to force software?

Do you use filters, colorTransform or cacheAsBitmap in your project anywhere?

It’s acting weird. In test project which I posted here, only with different sprite count CPU usage roughly halves (70% with OpenGL, while 36% with Cairo), but on the other hand project with HaxeUI goes from 50% with OpenGL to 75% with Cairo. But anyway, it’s still not even close to Windows performance, in Windows 7 running in VM over macOS both projects are around 5% – 15% CPU, as I said before, not to mention state of VirtualBox graphics acceleration is questionable at best.

I’m not using colorTransform or cacheAsBitmap in either of those projects, as far as I’m aware. Neither does HaxeUI, or at least search didn’t find any occurrences.

I tested with different versions of SDL (latest dev and stable 2.0.5), looked into Lime configuration, but so far no progress. As of now, OpenFL is practically unusable on macOS target, at least for me, as I’m still getting at least 10x worse performance on native macOS system than in Windows in virtual machine running on the same computer.

Can anyone else please test this? Could this be SDL related problem?

What is your FPS? Could you target a lower FPS on Mac?

My first guess is GPU performance (and something strange on that note), but I’ll keep thinking.

This looks like it will be very difficult to track down. For now I managed to keep CPU usage down by modifying stage.frameRate on runtime – application is listening to events and if nothing changes, frame rate is kept at 5 FPS, otherwise it raises to 30 FPS, before it was always 60 FPS. Thanks for a hint.

Not sure about Windows / macOS difference. At first I also thought it’s something wrong with Intel HD 3000, but on Windows in VirtualBox it runs fine and afaik it also uses macOS drivers under it. Or could Windows have some mechanism to not render window anew if it didn’t change?

Question: Would it be possible to implement some low impact mechanism to not process / render DisplayObject or Graphics all over again, if it didn’t change from previous frame? I could try to hack something myself, but you certainly know better where would be the right spot to do this in the rendering chain. I believe it could save a lot of resources on some targets.

Just to support these findings: I have a much faster Mac (latest Macbook Pro with ATI 460) but still see around 67% CPU usage

1 Like

We’re working to that end, we need additional fixes on flagging renderDirty internally so that we catch render changes, but we’ve been making good progress

1 Like

BTW, this looks interesting:

Not on a Mac right now, so I can’t test changes, but the amount of sleep allowed in the main loop can have a big difference on CPU performance

1 Like

Great news!

I just tested both cases of this on current git Lime and I’m getting roughly same performance.

Hello @wildfireheart,

Try solution if performance runs faster if you will know.

class Main extends Sprite {

private var tf:TextField;

public function new () {

  super ();
  for (i in 0...1000){
  	var sprite = new Sprite();
		sprite.addEventListener(MouseEvent.CLICK, callback);
		
		sprite.graphics.beginFill(0xFF00FF00);
		sprite.graphics.drawRect(0, 0, 50, 50);
		sprite.graphics.endFill();
		Lib.current.addChild(sprite);
  }
  
  tf = new TextField();
  tf.background = true;
  tf.backgroundColor = 0xffffffff;
  tf.text = "For: " + (Lib.getTimer()+"ms");
  tf.x = tf.y = 100;
  Lib.current.addChild(tf);

}

private function callback(evt:Event):Void
{
tf.text += “\nClicked me!”;
tf.text += "\nFor: " + (Lib.getTimer()) + “ms”;
}
}

Result: 245ms from drawing sprite
1935ms is clicked event

If you use var callback = function(evt:Event):Void { … }
Result 266 to 270ms It is slower than private function callback(evt:Event):Void { … }
and Clicked event from variable callback is 4225ms. You forget to write “Void” after var callback = function(evt:Event):Void { … }
Result: from Sprite is 268ms, clicked event in variable is 2952ms.
Don’t forget optimization of fast performances: HAXE and AS3 are very same.
private function handle(e:Event, MouseEvent etc...):Void { .. }
faster than:
in current function or variable gives function

Please show me with ms or s how is performance?
If you can’t resolve it. Than @singmajesty will fix Lime / OpenFl.

As far as I know, it is not necessary to write types explicitly, Haxe compiler will resolve it for you anyway.

EDIT: In this case it seems to compile to Dynamic, so it can be actually a bit slower, but I didn’t test it.

But it doesn’t matter, the problem we are solving here is that same code does a lot less CPU activity (roughly 10x less) on Windows (even in VM) than on macOS on the same computer. The code I posted is just an example to show the problem, I’m not using it.

1 Like

How does PiratePig perform? I ran some tests on my 13" MacBook, I get 95% CPU use from openfl test neko, but approximately only 4.5% CPU use from openfl test mac. Changing the demo to use 60 FPS increased the CPU to about 9.5% CPU, and then enabling vsync brought it lower to about 8.2% CPU.

This seems reasonable, but perhaps there is something triggering different performance on your machine?

2 posts were split to a new topic: Error compiling TransformActuator on C++

Well I’m also getting around 9% CPU on PiratePig with openfl test mac but honestly, it doesn’t use that many Sprites. You need much higher number of sprites to see the difference between Windows and macOS. Just use the example I posted on the beginning of this thread, it illustrates how for example HaxeUI OpenFL backend works – it redraws component graphics (each component consists of one or more Sprite instances) only when something changes and then does nothing.

Speaking about that example, you need to draw something on sprite.graphics to see the problem, when I just add 1000 empty Sprite instances with event listeners, CPU usage is around 4%, as soon as I draw one single colour rectangle on every Sprite, CPU usage raises to 75% CPU on macOS, while it stays around 5% on Windows.

PS: 1000 sprites aren’t that many, when you have even slightly complex UI (with tables of components for example), as I’m doing right now. It could probably be done more efficiently with Tiles or whatever, but it still doesn’t explain the macOS/Windows difference.

It’s definitely not the problem of my machine, I was thinking about it before, but I tried the example code on two different MPB (with Intel HD 3000 and Intel HD 4000) and it behaves the same, also @hak88 confirmed the issue on Mac with ATI card, so it’s not Intel graphics problem either. I’m guessing it will be either something in SDL, SDL/Lime interfacing layer or maybe even some quirk in OpenGL implementation on macOS (which is traditionally not great) which Lime by chance uses. But I did quick search and didn’t find any complaints about major SDL performance difference on macOS/Windows so I don’t know.

We will hit Cairo performance when the vector graphics are redrawn, but then it’s up to OpenGL for each frame afterward. Perhaps the drivers are better on Windows or we are hitting a slow path on macOS for some reason