Quickest way to rearrange bytes

I’ve started creating a program that is supposed to be able to read a range of different pixel formats. I tested this initially with a simple file in 32-bit RGBA format. I did this fairly simply by rearranging the bytes a bit (read from a BytesInput and write to a BytesOutput), then passing the result as a ByteArray into BitmapData.setPixels().

This works great, only it takes much too long. I mean, it is a fairly large file (2048x2048 pixels, that’s 16777216 bytes), but programs such as TextureFinder manage to do this much faster, switching between different formats quickly.

I’m sure that the issue is in the byte rearrangement, if I comment that out and just read it the data as ARGB directly, it obviously gives weird colours but loads much quicker, no more than a second wait. Rearranging the bytes takes at least 10 seconds, maybe even 20. Admittedly, my machine isn’t very high-end, but other programs still manage to do it much faster.

Here’s my byte handling code:

var out:BytesOutput = new BytesOutput();
out.bigEndian = true;
var input:BytesInput = new BytesInput(bytes);
input.bigEndian = true;

while ((input.position + 4) <= input.length)
{
    var rgb:Int = input.readInt24();
    var a:Int = input.readByte() << 24;
    out.writeInt32(a + rgb);
}

trace("foo");

The variable bytes is just a Bytes object of the loaded file. I know that this is the code that takes time because if I comment out the loop then “foo” traces much quicker.

Any suggestions on how to speed this up would be appreciated!

There are multiple possible optimizations to suggest, but first try declaring rgb and a outside the loop.

var rgb:Int;
var a:Int;

while ((input.position + 4) <= input.length)
{
    rgb = input.readInt24();
    a = input.readByte() << 24;
    out.writeInt32(a + rgb);
}

Thanks, unfortunately that doesn’t seem to have much of an effect. It’s probably a good suggestion regardless though, so I’ll keep it like that.

Oh well. Someone had mentioned there was an issue with variables declared in loops, but I guess it’s something else.

My next suggestion is not to use the BytesOutput or BytesInput classes. They both wrap the Bytes class, adding unnecessary code to each operation.

var input:Bytes = bytes;
var output:Bytes = Bytes.alloc(input.length);
#if flash9
output.getBytes().endian = flash.utils.Endian.BIG_ENDIAN;
#end

var a:Int;
var r:Int;
var g:Int;
var b:Int;

var pos:Int = 0;
while (pos + 3 < input.length)
{
    r = input.get(pos);
    g = input.get(pos + 1);
    b = input.get(pos + 2);
    a = input.get(pos + 3);
    
    output.set(pos, a);
    output.set(pos + 1, r);
    output.set(pos + 2, g);
    output.set(pos + 3, b);
    
    pos += 4;
}

trace("foo");

Thanks! That made quite a noticeable difference, cut the time in about half, if not less (just pure memory estimation btw, I’m not benchmarking it or anything). This should hopefully be fairly sufficient, since most images shouldn’t be this large (and waiting a bit is to be expected for large images anyway).

Of course, any further optimizations would be highly appreciated; the faster the better!