Epiphany bit test
Whilst working on the assembly version of the object detector code I came across the need for a bit test. Unfortunately paralella doesn't have one built in and also unfortunately there is no literal form of the AND instruction which is normally used for such things.
Fortunately however there is still a way to encode a single-bit test it into a single instruction.
CPUs always track the sign bit in the status register and its easy toput any bit there. The only question was how to extract that since there is no Bcc that maps directly to it.
Starting with the LSL instruction and it's flag updates:
LSL <RD>, <RN>, #IMM5 RD = RN << <OP2> AN = RD[31] AV = 0 AC = 0 (pity AC doesn't follow the last out-shifted bit) if (RD[31:0]==0) { AZ=1 } else { AZ=0}
And then looking at the Bcc operation that utilise the negative flag:
0110 Greater Than (Signed) BGT ~AZ & (AV ==AN) 0111 Greater Than or Equal (Signed) BGTE AV == AN 1000 Less Than (Signed) BLT AV !=AN 1001 Less Than or Equal (Signed) BLTE AZ | (AV != AN)
Since AV is always 0 after a shift, this leads to a fairly straightforward pair of tests:
;; Test bit X is set lsl r0,r1,#(31-X) blt .bit_is_set bgte .bit_is_not_set
And the same for the MOVcc instruction.
Having this just as efficient as if one had a bit-test instruction is rather more handy than if it wasn't. Bits are such a compact way to represent information it's a way to save memory and anything that saves memory is a big plus on epiphany.
The C compiler just follows the C one uses to implement bit tests:
int main(int argc, char **argv) { return (argc & (1<<5)) ? 6 : 9; }
00000000 <_main>: 0: 2403 mov r1,0x20 2: 00da and r0,r0,r1 4: 00c3 mov r0,0x6 6: 2123 mov r1,0x9 8: 0402 moveq r0,r1 a: 194f 0402 rts
But it can be done literally "one better":
_main: lsl r0,r0,#(31-5) mov r0,#6 mov r1,#9 movgte r0,r1 rts
Update: Actually a better example would be a higher bit. C falls further behind ...
int main(int argc, char **argv) { return (argc & (1<<21)) ? 6 : 9; }
00000000 <_main>: 0: 200b 0002 mov r1,0x0 4: 240b 1002 movt r1,0x20 8: 00da and r0,r0,r1 a: 00c3 mov r0,0x6 c: 2123 mov r1,0x9 e: 0402 moveq r0,r1 10: 194f 0402 rts
Also note the code-size expansion from 2 bytes to 10 for the low-register source operand. Although i don't know why the compiler is using the 32-bit form of mov r1,0x0 since the 8 bit form would suffice and it would only need 8 bytes.
For high-registers this would be 4-bytes vs 12.