BitmapData processing / Performance on Native target

Hi Everyone,

I’m playing with away3D and most issue I have concern slow performance on native target ( windows for
now - didn’t tested Linux yet ) related to bitmapdata processing.

Classic process : async image loading > copypixel > resulting bitmapdata to be use as textures

Configuration : Lime 2.3.0 / Openfl 3.0.0-Beta.3 / hxcpp 3.2.37

Windows / Legacy

I had to replace getRGBAPixels copy function by this one to be able to try to reach something close to flash performance. This is absolutly ugly ( alpha channel pixel n is the one from pixel n+1 ) but enough for now to continue to work by waiting for a solution :

public inline static function getRGBAPixels (bitmapData:BitmapData):ByteArray {
var data = bitmapData.getPixels (new Rectangle (0, 0, bitmapData.width, bitmapData.height));
var size = bitmapData.width * bitmapData.height;
var copy = new ByteArray(size-1);
copy.writeBytes( data, 1, 0);
return copy;
}

Windows / Next

Impossible to use actually while I expected huge performance boost with OpenGL layer used in background.

copyPixels benchmark = from 1~2 ms ( Flash & Native Legacy ) to up to 4 seconds on native Next

Draw benchmark = from 10ms ( Flash & Native Legacy ) to up to 4 seconds on native Next

This is not possible to have so poor performance … a bad parameter during compilation ?

I switch from “legacy” to “next” build by using these parameters in xml project file :

<set name="openfl_legacy" />

or

<set name="openfl_next" />

Thank you,

Could you explain the modification to the legacy method a bit more? Are you trying to alter the whole alpha channel? Would you need to do more than offsetting by one pixel to do that?

In the newer code, I’m guessing there’s a lot of low-hanging fruit regarding performance optimization. We might need to get to the point of having CFFI and C++ functions within there for performance, but I want to believe that we don’t have to. Compiling Haxe to C++ should give us many similar benefits.

I thnk that optimizing the methods and looking at the performance of the ByteArray/typed array implementations could yield some big results

There is also some C++ specific code in Haxe that we could push into Lime, but the more we don’t do platform-specific optimizations, the more it helps all the targets at once.

Hi Singmajesty,

Thank for you feedback :wink:

getRGBAPixels is doing a ARGB to RGBA pixel conversion ( through array element exchange or bitwise operation ) for each pixel. For a 2048x2048 texture, this process ( sync ) have to be done on 4194304 pixels.

For now as the input is a bytearray ARGBARGBARGB… I loose the first pixel A and copy the whole bytearray in one call applying A from pixel n+1 to pixel n. This is not a solution just an temporary workaround for real time processing needs.

To go further I’m actually doing some OpenGL experiment ( Openfl.gl ) for native target … if ( Flash ) > use Away3D if ( Native ) > use my own OpenGL engine > Openfl.gl for maximum performance minimizing CPU works on bitmapdata - all process to be done and used GPU side ).

In the newer code, I’m guessing there’s a lot of low-hanging fruit regarding performance optimization. We might need to get to the point of having CFFI and C++ functions within there for performance, but I want to believe that we don’t have to. Compiling Haxe to C++ should give us many similar benefits.

Indeed I think this should be the primary goal to have Haxe to C++ compilation reaching almost same performance than natice C++ code. I’m new to Haxe just started a couple weeks ago, the work done is quite impressive, maturity will come no doubt :wink:

I thnk that optimizing the methods and looking at the performance of the ByteArray/typed array implementations could yield some big results

I do not have enough visibility yet but if there is solution to improve bytearray / array type, this will enchance global performance.

There is also some C++ specific code in Haxe that we could push into Lime, but the more we don’t do platform-specific optimizations, the more it helps all the targets at once.

The tricky point regarding cross platform solution, always choices to make :wink:

Jeff

I will investigate further regarding the problem between Legacy and Next build to report the issue with more details

I did a few initial tests and was able to get over 100 times better performance in a few methods using more optimized code. I intend to replace the typed array implementation with one that’s designed for faster data manipulation, then to try and use that for cross-platform that also performs ideally :slight_smile: